Partitioning in a Scenario

Using scenarios, we can automate the build of our partitioned datasets. We can even define schedules, such as daily or monthly, and use keywords or variables to identify which partition to build.

Example: Time-Based Dimension

One benefit of using a scenario is the ability to schedule tasks triggered by fixed time periods. For example, we can create a scenario to run a daily scheduled task to build all partitions in a date range.

Note: To tell Dataiku DSS to select all partitions in a date range, use the “/” character between the two dates.

../../_images/build-date-range.png

With this scenario scheduled to run on a daily trigger, DSS will build every partition in this date range, every day.

../../_images/run-scenario-daily.png

Keywords

For time-based partitioning, we can also use special keywords. For example, for a daily scheduled task, we could use “PREVIOUS_DAY” to build the partition corresponding to the previous day, every day.

../../_images/build-previous-day.png

This way, we can automatically select the partition corresponding to the previous day without having to edit the parent recipe in the Flow.

../../_images/using-keywords.png

For more examples of ways to use keywords, visit the Variables in scenarios in the Dataiku DSS reference documentation.

Variables in Scenarios

To make this process more flexible, we can use variables to specify the target identifier, or the partition identifier. The variable replaces the string that identifies what partitions we want in the output dataset.

../../_images/using-variables.png

Build Partition Corresponding to Last Week

To build a partition corresponding to “last week”, we can use the Define scenario variables step.

In this example, we’ve created five variables: “today” uses the expression “now” and is the end of our date range; “today_id” lets us reference “today”; “last_week” uses an expression that looks back seven days, and is the beginning of our date range; “last_week_id” lets us reference “last_week”; and “partition_range” links the variables, today and last_week, together.

../../_images/define-scenario-variables.png

Finally, with our variables defined, we add another Build step to set our target identifier to the third variable we created, partition_range. Applying the proper syntax, we wrap the name of the variable in curly braces and add a “dollar sign” to the front, resulting in ${partition_range}.

../../_images/partition-range-referenced.png

When we run this scenario, Dataiku DSS will compute the partitions corresponding to the dates in the partition range.

Example: Discrete-Based Dimension

We can also use scenarios to build partitions along a discrete-based dimension. In this example, we’ll create a custom Python script instead of a sequence of steps.

In the Script tab, we’ve defined the dataset and the discrete partition that we want to build.

../../_images/build-internet-partition.png

When we run this scenario, Dataiku DSS will compute the “internet” partition.

What’s next?

In this lesson, we discovered different ways we can use scenarios to automate the build of our partitioned datasets. You can visit the Dataiku DSS reference documentation to find out more about working with Variables in Scenarios.