Concept | Prepare recipe#
Watch the video
The Prepare recipe is a visual recipe in Dataiku that allows you to create data cleansing, normalization, and enrichment scripts in a visual and interactive way.
Adding transformation steps to the script#
To prepare your data, you must add steps to the recipe script.
Using the processor library#
An essential advantage of the Prepare recipe is its library of around 100 data processors. Most processors are designed to handle one specific task, such as filtering rows, rounding numbers, extracting regular expressions, concatenating or splitting columns, and much more.
Processors empower you to perform a huge variety and combination of tasks. One processor, for example, is a Formula language, similar to what you might find in a spreadsheet program, which you can use to create new columns from those already present, drawing on a range of built-in functions.
Another processor even lets you create a Python function for each row.
In addition to directly adding steps from the processor library, you can add steps to the script in a number of other ways.
Using the Analyze window#
Another method to add steps to the script is through the Analyze window.
Within a Prepare recipe, the Analyze window can guide data preparation, for example merging categorical values.
Manually moving the columns#
You can also directly drag columns to adjust their order, or switch from the Table view to the Columns view to apply certain steps to more than one column at a time.
Previewing and applying the script#
When adding new steps to the script, you’ll notice how the step output is immediately visible. This is possible because the step is being applied to the same sample of the dataset found in the Explore tab. The quick feedback allows you to work incrementally, quickly modifying your transformation steps.
Notice that steps in the script constitute a list of instructions. These instructions are not immediately applied to the dataset itself.
For example, adding a Delete Column step removes that column from the step preview, but it does not actually delete the column in the dataset, as it would in a spreadsheet.
Only when you choose to actually run the recipe will Dataiku execute the instructions on the full input dataset, and thereby produce a new output dataset. The original input dataset always remains.
Managing the script#
If a script starts to grow in complexity, a number of features can help you manage them. You can:
Disable steps.
Organize individual steps into groups of steps.
Add colors and comments to steps in order to send reminders to yourself and colleagues.
Copy and paste steps within the same recipe or to another recipe, even if that recipe is in another project or another Dataiku instance.
What’s next?#
In this article, you learned how to use the Prepare recipe for data cleansing, normalization, and enrichment.
Note
Instead of building recipes directly in the Flow, when your workflow is in production, to avoid disturbing it, you can use a visual analysis in the Lab for experimental work.
Continue getting to know the basics of Dataiku by learning about date handling.
Tip
You can find this content (and more) by registering for the Dataiku Academy course, Visual Recipes. When ready, challenge yourself to earn a certification!