Concept | Dynamic dataset and recipe repeat#

Using dynamic dataset and recipe repeat features, you can iterate through a dataset as input for certain computations.

Use cases#

A dataset or recipe repeat takes a secondary Parameters dataset as a driver and reads or runs, respectively, as many times as there are rows in the Parameters dataset. At each iteration, variables are replaced in the settings of the main dataset (or recipe) based on the current row of the Parameters dataset.

This repeat can be used for various use cases, such as:

  • Execute an SQL recipe on each SQL table in a dataset.

  • Use an Export to folder recipe to export multiple files if they are listed in a defined dataset.

Let’s discover one each for datasets and recipes.

Dynamic dataset repeat#

You may want to enable the dynamic dataset repeat setting on a dataset when you need to concatenate files from an upstream folder. Instead of converting each file to a dataset and stacking them, you could create a more automated and dynamic pipeline. For instance, if you want to concatenate the three most recently modified files in a source folder each time you rebuild a dataset, you could:

  • Create a dataset that lists all files in the source folder.

  • Use a TopN recipe to retrieve the list of three most recently modified files.

  • Create a dataset that reads from a folder and uses a variable for the file path.

  • Enable the dynamic dataset repeat setting on the dataset to iterate through the paths listed in the dataset you created with the TopN recipe.

Dataiku screenshot of a flow introducing dataset repeat use case.

Dynamic recipe repeat#

Let’s assume you need to download several files from different URLs. Instead of downloading a file at a time and creating an untidy Flow with several Download recipes, you can use the recipe repeat setting within one Download recipe. You can use the recipe repeat setting within a Download recipe. This way you can dynamically download and store files into a folder with one list of URLs.

To build this Flow, you can:

  • Create a dataset that records the download URLs of the files you want.

  • Create a Download recipe and enable the repeating recipe parameter to iterate on each URL of the dataset that lists them.

  • Set the URL column name as a variable for the URL parameter of the Download recipe.

Dataiku screenshot of a flow introducing recipe repeat use case.

The recipes that support the recipe repeat feature are:

  • Download recipe.

  • SQL recipe.

  • Export to folder recipe.

Configuration#

Let’s take a closer look at the dynamic dataset and repeating recipe configurations. To access the feature, you need to click on the dataset or recipe you wish to repeat and go to the advanced settings.

Note

While the dynamic recipe repeat settings are located in the Advanced tab, you need to navigate through Settings > Advanced to find the dynamic dataset repeat settings.

The Dataset parameter defines the dataset you use to iterate on. You can choose any of the datasets in your project.

The Mode setting has two options:

  • Make a variable for each column. This will automatically use each column as a variable.

  • Explicitly map columns to variables. This will only create variables from columns if you define them.

What’s next?#

Continue learning about the recipe repeat by working through the Tutorial | Dynamic recipe repeat tutorial.