Tutorial | Dynamic recipe repeat#

The dynamic recipe repeat feature allows you to execute multiple runs of a recipe, iterating on a dataset to update parameters each time.

Get started#

Objectives#

In this tutorial, you will:

  • Create an Export to folder recipe.

  • Create a dataset that defines which data you will export.

  • Use the recipe repeat feature for the Export to folder recipe.

Prerequisites#

  • Dataiku 13.2 or later.

  • An advanced understanding and regular use of recipes, datasets, and folders.

Create the project#

To create the project:

  1. From the Dataiku Design homepage, click + New Project > DSS tutorials > Advanced Designer > Dynamic Recipe Repeat.

  2. From the project homepage, click Go to Flow (or g + f).

Export multiple files using recipe repeat#

Let’s say you have a dataset that includes information about the most successful and famous Hollywood movies. You also want to export a report folder that breaks this dataset down into different CSV files per distributor. The report folder would contain a dataset for each distributor including every filtered film. In other words, it’s like partitioning the dataset on the distributor column.

The Flow is composed of the Highest_Hollywood_Grossing_Movies dataset which contains the data of the Hollywood movies including their title, distributor, release date, and genre, to name a few.

Define the list of distributors in a dataset#

First, we must get the list of each unique distributor. To do so, we can use a Distinct recipe:

  1. Click the Highest_Hollywood_Grossing_Movies dataset.

  2. In the Actions panel, select the Distinct recipe and click on Create recipe.

  3. Under Operation mode, select Find distinct values of a subset of all columns.

  4. In the Available columns panel, select Distributor and switch it by clicking on the single right arrow.

  5. Click on Save.

  6. Click on Run.

Dataiku screenshot of the distinct visual recipe.

The one-column output dataset now contains the list of each distributor.

Create the Export to folder recipe repeat#

Now, we can create the recipe that we wish to repeat.

  1. Select the Highest_Hollywood_Grossing_Movies dataset.

  2. In the Actions panel, under the Other recipes dropdown, select the Export to folder recipe.

  3. Under Name for the output, enter Hollywood_Movies_Per_Distributor.

  4. Click Create folder to confirm.

  5. Click Create recipe.

Once the recipe is created, we can set the repeating parameters.

  1. Navigate to the Advanced tab.

  2. Select the Enable checkbox in the Repeating recipe panel.

  3. From the Dataset dropdown, select the Highest_Hollywood_Grossing_Movies_distinct dataset.

  4. Click Save.

Note

Keep the default Mode. Thus, the distributor column is by default used as the distributor variable.

Dataiku screenshot of the distinct visual recipe.

Here, we choose to iterate the recipe on each distributor. Now that we have the recipe repeat set up, we can use the iterator variable to accomplish our goal.

  1. Switch back to the Settings tab.

  2. Unselect the With header checkbox.

  3. Toggle on the Filter.

  4. Next to Keep only rows that satisfy, select a formula.

  5. Enter the formula Distributor == '${Distributor}'. Each time the recipe repeats, the ${Distributor} variable will be the next distributor from the dataset we created.

  6. Enter ${Distributor}_FILM.csv as the File name.

  7. Click Save.

  8. Click Run.

Dataiku screenshot of the recipe parameters.

The new folder will contain one CSV file per distributor listed in the prepared dataset.

What’s next?#

Congratulations! You’ve seen how to create, configure, and manipulate a recipe repeat in an Export to folder recipe making your Flow more dynamic.

Check out the Academy for other advanced features to master!