Tutorial | Dynamic recipe repeat#

The dynamic recipe repeat feature allows you to execute multiple runs of a recipe, iterating on a dataset to update parameters each time.

Get started#

Objectives#

In this tutorial, you will:

  • Create an Export to folder recipe.

  • Create a dataset that defines which data you will export.

  • Use the recipe repeat feature for the Export to folder recipe.

Prerequisites#

  • Dataiku 13.2 or later.

  • An advanced understanding and regular use of recipes, datasets, and folders.

Create the project#

To create the project:

  1. From the Dataiku Design homepage, click + New Project > DSS tutorials > Advanced Designer > Dynamic Recipe Repeat.

  2. From the project homepage, click Go to Flow (or g + f).

Note

You can also download the starter project from this website and import it as a zip file.

Use case summary#

Let’s say you have a dataset that includes information about the most successful and famous Hollywood movies. You also want to export a report folder that breaks this dataset down into different CSV files per distributor. The report folder would contain a dataset for each distributor including every filtered film. In other words, it’s like dividing the dataset into smaller datasets per distributor.

The Flow is composed of:

  • The Highest_Hollywood_Grossing_Movies dataset which contains the data of the Hollywood movies including their title, distributor, release date, and genre, to name a few.

  • The Distinct_Distributors dataset which contains the list of each unique distributor.

Create the Export to folder recipe repeat#

First, we want to create the recipe that we wish to repeat.

  1. Select the Highest_Hollywood_Grossing_Movies dataset.

  2. In the Actions panel, in the menu of Other recipes, select the Export to folder recipe.

  3. Under Name for the output, enter Hollywood_Movies_Per_Distributor.

  4. Click Create Folder to confirm.

  5. Click Create Recipe.

Set the repeating settings#

Once the recipe is created, we can set the repeating parameters.

  1. Navigate to the Advanced tab.

  2. Select the Enable checkbox in the Repeating recipe panel.

  3. From the Dataset dropdown, select the Distinct_Distributors dataset.

  4. Click Save.

Note

Keep the default Mode. Thus, the distributor column is by default used as the distributor variable.

Dataiku screenshot of the distinct visual recipe.

Set the repeating variable#

Here, we choose to iterate the recipe on each distributor. Now that we have the recipe repeat set up, we can use the iterator variable to accomplish our goal.

  1. Switch back to the Settings tab.

  2. Unselect the With header checkbox.

  3. Toggle on the Filter.

  4. Next to Keep only rows that satisfy, select a formula.

  5. Enter the formula Distributor == '${Distributor}'. Each time the recipe repeats, the ${Distributor} variable will be the next distributor from the dataset we created.

  6. Enter ${Distributor}_FILM.csv as the File name.

  7. Click Save.

  8. Click Run.

Dataiku screenshot of the recipe parameters.

The new folder will contain one CSV file per distributor listed in the prepared dataset.

What’s next?#

Congratulations! You’ve seen how to create, configure, and manipulate a recipe repeat in an Export to folder recipe making your Flow more dynamic.

Check out the Academy for other advanced features to master!