Tutorial | Coding with variables#
Get started#
If you are comfortable using variables in visual recipes, your next step may be to start using them in code to make development tasks more efficient and automation tasks more robust.
Objectives#
In this tutorial, you will:
Retrieve and update project-level variables with
get_variables()
andset_variables()
.Retrieve all variables, including instance-level variables, with
dataiku.get_custom_variables()
.
Prerequisites#
To complete this tutorial, you’ll need the following:
Dataiku version 12.0 or above.
A Python environment that includes the package matplotlib.
Create the project#
From the Dataiku Design homepage, + New Project > DSS tutorials > Developer > Variables for Coders.
From the project homepage, click Go to Flow (or
g
+f
).Since the Flow uses matplotlib in a recipe, change the project’s code environment to one that includes this library as mentioned above.
Note
You can also download the starter project from this website and import it as a zip file.
Define project variables#
Let’s start by defining project variables as a JSON object.
From the top navigation bar, go to … > Variables.
Copy-paste the following JSON under the Global variables section of the Project variables page:
{ "country_name": "Germany", "merchant_url": "costco", "most_recent_date": "2011-12-09" }
Click Save.
Get and set project variables in a notebook#
Now let’s retrieve these project variables in a notebook and define new ones programmatically.
Go to the Notebooks page (
g
+n
).Click + New Notebook > Write your own > Python > Create.
Replace the notebook’s starter code with the following snippet to create a project handle:
import dataiku
# create a project handle
project = dataiku.Project()
Copy-paste and run the following snippet in the notebook to retrieve the project’s variables in the form of a dictionary:
# retrieve variables as a dictionary
variables = project.get_variables()
print(variables)
Note
These are standard project variables rather than local. The latter are not included when creating a project bundle.
Once you have a Python dictionary of the project’s variables, update it, and reset the project’s current variables.
Copy-paste and run the following snippet in the notebook:
# manipulate the dictionary to update or create any variable
variables["standard"]["your_variable_name"] = "your_variable_value"
# set the updated dictionary
project.set_variables(variables)
Return to the project’s Variables page to see the new
your_variable_name
defined in the JSON.
Use variables in a code recipe#
Notice that the get_variables()
and set_variables()
methods belonged to dataiku.Project()
, and so only returned project variables.
In some cases though, you may want to access other kinds of variables, such as those at the instance-level. For these cases, look to dataiku.get_custom_variables()
.
To demonstrate, the second Python recipe in the Flow creates a 3D scatter plot of order totals based on customer ages and times of purchase. However, it only does so for one hard-coded country. Let’s replace this with the country_name
variable, which has already been defined.
From the Flow, open the second Python recipe which outputs the managed folder order_total_3D_scatter_plot.
Find line 15 where
United States
is hard-coded.
df_filtered = df[df['MerchantIP_country'] == 'United States']
Replace that line with the following two lines of code:
country_name = dataiku.get_custom_variables()["country_name"]
df_filtered = df[df['MerchantIP_country'] == country_name]
Important
Notice the difference between project.get_variables()
demonstrated above and dataiku.get_custom_variables()
demonstrated here. Run the later in the previous notebook to see what other variables are available beyond those found in the project variables JSON.
Tip
Here we provided the correct code for you, but you can also let Dataiku insert it for you.
Click Validate.
Navigate to the Variables tab to the left of the recipe code.
Place your cursor where you want to insert the code.
Click on the variable you need to insert a reference.
Click Run (or use the keyboard shortcut
@
+r
+u
+n
) to execute the recipe.When the job finishes, find the new scatter plot for Germany (as defined by the variable) in the order_total_3d_scatter_plot folder.
Tip
As an exercise, customize the name of the output image file by using the variable country_name
.
What’s next?#
Once you have a basic handle on programmatically working with variables, you’ll find that they can be used in many other places of Dataiku beyond notebooks and recipes, such as scenarios and Dataiku applications.
Next, you may be interested in exploring the Developer learning path in the Academy or the Developer Guide!