We can use variables in our project Flows to turn high-maintenance, hard-coded information into efficient variables that can be used to develop simpler Flows and make automation tasks more robust.
If you have ever used a webapp that asked you to input a value, or encountered a deceptively simple project Flow with the power to swap out input column values effortlessly, you were probably witnessing variables at work!
In programming, variables store pieces of information, such as the name of a company, or a category ID. This piece of information can then be used many times in our code. When we use variables, we can avoid the cumbersome work of “hard coding” information.
Let’s compare two code snippets that output the same text. Although the output is the same, one is much more robust than the other.
The code snippet on the left is an example of “hard coding”. The company name, Dataiku, is hard coded, or repeated three times. Hard-coded information is maintained manually and is error-prone. Imagine if we had a typo or two in the company name–we may not be able to find all the instances of its occurrence. This could cause errors and there would be inconsistencies in our output.
The code snippet on the right includes a variable arbitrarily named
cie that stores the string value
Dataiku. Having a variable for the company name not only means more consistent output, it also means we can use this same code snippet for different values of the variable.
Defining a Variable¶
You can use variables in all Dataiku DSS visual recipes, code recipes, scenarios, and other objects in the Flow. Project variables are only available to a specific project, whereas Global variables are available to the entire Dataiku DSS instance.
Here, we’ve defined project variables, “merch_state” and “merch_category”. When creating variable names, it’s a good idea to use short, descriptive nouns.
The syntax to define a variable is to wrap both the name and the value in quotes and separate them with a colon. To call the variable in a recipe, use a dollar sign followed by the defined variable wrapped in curly braces.
We can easily create variables for frequently used, and frequently updated information such as company sector, company year of creation, and logo description.
Variables in a Recipe¶
Once we’ve defined the value of our variable in our code, we can edit it once, and Dataiku DSS updates it everywhere in our Flow.
Let’s use an example. In this simple Flow, we have a Prepare recipe and three Filter recipes. The purpose of the Flow is to create prepared output datasets that correspond to a specific geographical territory, such as a state, which can then be exported as CSV files.
Upon inspecting this Flow, we can see that the territory is hard-coded. This means, to export a CSV file for a single territory, such as Nebraska, we must first edit the name of the territory in the Prepare recipe, and then edit the name of the territory in the Filter recipe.
We can replace our hard-coded information using the variables we set up for the project. This will make our Flow much simpler and less error-prone. In addition, when we want to output a CSV file by territory and merchant category, all we have to do is edit the values of the project variables, then run the Flow, all without ever having to edit a recipe!
For example, we can reference our variable “merch_state” in a formula step within the Prepare recipe to output only those records for the value we set up for this variable, which is “Nevada”. Similarly, we can reference the same variable in the Filter recipe.
Variables in Complex Flows¶
When variables are used globally, and for more complex Flows, such as Dataiku Applications and Webapps, their benefit becomes even more apparent.
A Dataiku Application is an application created using the Application Designer. Starting with Dataiku DSS 8.0, we can convert projects into reusable applications. For example, we can turn our project into a Dataiku Application that makes use of our variables by providing users with a simple interface for selecting which values they want. In this way, users do not have to understand all of the behind-the-scenes details.
Here, the application allows the user to identify which state territory they want to use to build the Flow.
In the project that was used to create the application, we can see that the variable has been defined in the Filter recipe.
In addition, a scenario has been created to build the output dataset filtered by state territory.
The application design uses a tile called Edit project variables where the code references the existing variable, “merch_state”.
Finally, the application design uses a tile called Run scenario. This runs the scenario that was defined in the project.
Similarly, we can create a simple Webapp that allows users to select the value of a variable. A Webapp can be a Code Webapp or a Visual Webapp that you develop by writing code through the code menu in Dataiku DSS.
In this example, the Webapp re-creates the functionality of the Dataiku Application, where the user identifies the name of a state territory. When the user clicks Run, the Webapp updates the “merch_state” variable, and runs the Filter recipe behind the scenes. This particular Webapp has been designed to output the data in the webapp itself.
Visit Hands-On: Pivot Recipe where you can use a built-in tutorial on your Dataiku DSS instance to define a variable and apply it as part of a pre-filter in a Pivot recipe.
To try building a more advanced application that uses a customer ID variable, visit the Dataiku Applications Tutorials lesson: Create a Visual Application.