Concept | Prepare recipe#

Watch the video

The Prepare recipe is a visual recipe in Dataiku that allows you to create data cleansing, normalization, and enrichment scripts in a visual and interactive way.

Adding transformation steps to the script#

To prepare your data, you must add steps to the recipe script.

Using the processor library#

An essential advantage of the Prepare recipe is its library of around 100 data processors. Most processors are designed to handle one specific task, such as filtering rows, rounding numbers, extracting regular expressions, concatenating or splitting columns, and much more.

Screenshot showing the processor library in Dataiku.

Processors empower you to perform a huge variety and combination of tasks. One processor, for example, is a Formula language, similar to what you might find in a spreadsheet program, which you can use to create new columns from those already present, drawing on a range of built-in functions.

Another processor even lets you create a Python function for each row.

In addition to directly adding steps from the processor library, you can add steps to the script in a number of other ways.

Using the column context menu#

In the column context menu, Dataiku will suggest steps to add based on the column’s meaning.

For example, Dataiku will suggest to parse date columns, or remove rows with invalid values according to the column meaning. For a text column, it will suggest string transformations, such as converting to lowercase.

Screenshot of a column context menu in Dataiku.

Using the Analyze window#

Another method to add steps to the script is through the Analyze window.

Within a Prepare recipe, the Analyze window can guide data preparation, for example merging categorical values.

Screenshot of the Analyze window.

Manually moving the columns#

You can also directly drag columns to adjust their order, or switch from the Table view to the Columns view to apply certain steps to more than one column at a time.

Previewing and applying the script#

When adding new steps to the script, you’ll notice how the step output is immediately visible. This is possible because the step is being applied to the same sample of the dataset found in the Explore tab. The quick feedback allows you to work incrementally, quickly modifying your transformation steps.

../../_images/prepare-sample.png

Notice that steps in the script constitute a list of instructions. These instructions are not immediately applied to the dataset itself.

For example, adding a Delete Column step removes that column from the step preview, but it does not actually delete the column in the dataset, as it would in a spreadsheet.

Only when you choose to actually run the recipe will Dataiku execute the instructions on the full input dataset, and thereby produce a new output dataset. The original input dataset always remains.

Managing the script#

If a script starts to grow in complexity, a number of features can help you manage them. You can:

  • Disable steps.

  • Organize individual steps into groups of steps.

  • Add colors and comments to steps in order to send reminders to yourself and colleagues.

  • Copy and paste steps within the same recipe or to another recipe, even if that recipe is in another project or another Dataiku instance.

../../_images/prepare-options.png

What’s next?#

In this article, you learned how to use the Prepare recipe for data cleansing, normalization, and enrichment. Continue getting to know the basics of Dataiku by learning about date handling.