Partitioning in a Scenario

Using scenarios, we can automate the build of our partitioned datasets. We can even define schedules, such as daily or monthly, and use keywords or variables to identify which partition to build.

Example: Time-Based Dimension

One benefit of using a scenario is the ability to schedule tasks triggered by fixed time periods. For example, we can create a scenario to run a daily scheduled task to build all partitions in a date range.


To tell Dataiku to select all partitions in a date range, use the “/” character between the two dates.


With this scenario scheduled to run on a daily trigger, Dataiku will build every partition in this date range, every day.



For time-based partitioning, we can also use special keywords. For example, for a daily scheduled task, we could use “PREVIOUS_DAY” to build the partition corresponding to the previous day, every day.


This way, we can automatically select the partition corresponding to the previous day without having to edit the parent recipe in the Flow.


For more examples of ways to use keywords, visit the Variables in scenarios in the in the product documentation.

Variables in Scenarios

To make this process more flexible, we can use variables to specify the target identifier, or the partition identifier. The variable replaces the string that identifies what partitions we want in the output dataset.


Build Partition Corresponding to Last Week

To build a partition corresponding to “last week”, we can use the Define scenario variables step.

In this example, we’ve created five variables:

  • “today” uses the expression “now” and is the end of our date range;

  • “today_id” lets us reference “today”;

  • “last_week” uses an expression that looks back seven days, and is the beginning of our date range;

  • “last_week_id” lets us reference “last_week”; and

  • “partition_range” links the variables, today and last_week, together.


Finally, with our variables defined, we add another Build step to set our target identifier to the third variable we created, partition_range. Applying the proper syntax, we wrap the name of the variable in curly braces and add a “dollar sign” to the front, resulting in ${partition_range}.


When we run this scenario, Dataiku will compute the partitions corresponding to the dates in the partition range.

Example: Discrete-Based Dimension

We can also use scenarios to build partitions along a discrete-based dimension. In this example, we’ll create a custom Python script instead of a sequence of steps.

In the Script tab, we’ve defined the dataset and the discrete partition that we want to build.


When we run this scenario, Dataiku will compute the “internet” partition.

What’s next?

In this lesson, we discovered different ways we can use scenarios to automate the build of our partitioned datasets. You can visit the Dataiku product documentation to find out more about working with Variables in Scenarios.