Concept: Prepare Recipe¶
This content is also included in the free Dataiku Academy course, Basics 102, which is part of the Core Designer learning path. Register for the course there if you’d like to track and validate your progress alongside concept videos, summaries, hands-on tutorials, and quizzes.
The Prepare recipe is a visual recipe in DSS that allows you to create data cleansing, normalization, and enrichment scripts in an interactive way.
This is achieved by assembling a series of transformation steps from a library of more than 90 processors. Most processors are designed to handle one specific task, such as filtering rows, rounding numbers, extracting regular expressions, concatenating or splitting columns, and much more.
In addition to directly adding steps from the processor library, you can add steps to the script in a number of other ways.
In the column context menu, DSS will suggest steps to add based on the column’s meaning. For example, DSS will suggest to remove rows with invalid values according to the column meaning.
Another method to add steps to the script is through the Analyze window. Within a Prepare recipe, the Analyze window can guide data preparation, for example merging categorical values.
You can also directly drag columns to adjust their order, or switch from the Table view to the Columns view to apply certain steps to more than one column at a time.
When adding new steps to the script, you’ll notice how the step output is immediately visible. This is possible because the step is being applied to the same sample of the dataset found in the Explore tab. The quick feedback allows you to work incrementally, quickly modifying your transformation steps.
Notice that steps in the script constitute a list of instructions. These instructions are not immediately applied to the dataset itself. For example, adding a “Delete Column” step removes that column from the step preview, but it does not actually delete the column in the dataset, as it would in a spreadsheet. Only when you choose to actually run the recipe will DSS execute the instructions on the full input dataset, and thereby produce a new output dataset.
If a script starts to grow in complexity, a number of features can help you manage them.
You can disable steps.
You can organize individual steps into groups of steps.
You can add colors and comments to steps in order to send reminders to yourself and colleagues.
You can even copy and paste steps within the same recipe or to another recipe, even if that recipe is in another project or another DSS instance.