Tutorial | Variables for coders#

Hard-coded text strings are difficult to maintain and cannot be programmatically updated. In programming, variables store pieces of information, such as the name of a country or the name of a merchant. Once defined, you can use your variables many times in visual and code recipes–to make development tasks more efficient and automation tasks more robust.

Get started#

In this tutorial, we will refactor an eCommerce transactions project. We’ll turn hard-coded text strings into project variables that can be programmatically updated.

Objectives#

The objectives for this tutorial are as follows:

  • Define project-level variables.

  • Use project variables in visual and code recipes.

  • Create an automation scenario to update a variable. This will enable you to build an application to programmatically update the variable.

Prerequisites#

To complete this tutorial, you’ll need the following:

The 14-Day Free Online Trial contains a code environment, dash, that includes everything you need to complete the courses in the Developer learning path.

Note

This tutorial was tested using a Python 3.6 code environment. Other Python versions may be compatible.

Create the project#

  1. From the Dataiku Design homepage, +New Project > DSS tutorials > Developer > Variables for Coders.

  2. From the project homepage, click Go to Flow (or G+F).

Note

You can also download the starter project from this website and import it as a zip file.

Set the code environment#

Let’s designate this project’s code environment: one that uses Python 3.6 and has the package matplotlib (which will be needed later).

  1. From the More options menu in the top navigation bar, select Settings > Code env selection.

  2. Change the default Python code env by changing the Mode to Select an environment and the Environment to the chosen code env.

  3. Click Save and return to the Flow.

Build the Flow#

The initial starter Flow contains empty datasets. To work with these datasets, we’ll need to build the Flow.

  1. From the bottom-right corner of the Flow, click Flow Actions.

  2. Select Build all and keep the default selection for handling dependencies.

  3. Click Build to confirm.

  4. Wait for the build to finish, and then refresh the page to see the built Flow.

../../_images/starter-flow.png

Explore the Flow#

Let’s look for areas in the Flow where we could use project variables.

  1. Open the Python recipe which outputs the managed folder (on the right of the Flow).

    Screenshot of the opened Python recipe.
  2. Notice in line 15 that the country, United States, is hard-coded.

This recipe outputs a 3D scatter plot that displays order totals using a specific country name (in this case, United States) based on the customer’s age and time of purchase.

../../_images/transactions-scatter.png

We want to avoid having to manually update the country name each time we build the scatter plot. We’ll replace the text string, United States, with a project variable so that we can programmatically update it.

We also want to output a dataset of transactions for the “most recent date”, grouped by purchase hour, for the specified merchant URL.

To meet these business requirements, we’ll define three project variables.

Define project variables#

In this section, we’ll define our project variables.

  1. From the top navigation bar, go to the More Options menu, then choose Variables.

  2. Define the following project variables under the Global variables section:

    {
    "country_name": "Germany",
    "merchant_url": "costco",
    "most_recent_date": "2011-12-09"
    }
    

    Using this code, we’ve specified arbitrary values as the initial values for the global variables.

    Screenshot of the defined global variables.
  3. Click Save.

  4. Return to the Flow.

Next, we’ll use these newly defined project variables within code recipes in the Flow.

Modify the Flow to use the project variables#

In this section, we will refactor the project to use project variables.

Update the Python recipe#

Let’s use the country_name variable within our Python recipe to allow a dynamic way of creating the 3D scatter plot.

  1. Open the Python recipe that outputs the managed folder.

  2. Get the custom variable country_name with the function get_custom_variable(), just before filtering the DataFrame.

    country_name = dataiku.get_custom_variables()["country_name"]
    
  3. Replace the hard-coded string United States with country_name.

    df_filtered = df[df['MerchantIP_country'] == country_name]
    
    The two code lines to be changed

    Hint

    Click the Variables tab in the left-side panel. To see the list of variables, first validate your code. Then click on the variable to insert it into your code.

    Variables tab displaying the used variable in the code.
  4. Run the recipe.

  5. Wait for the job to complete, then click to view the folder order_total_3d_scatter_plot.

  6. Open the new scatter plot.

The plot is for transactions in Germany, the country defined in our project variables. As an exercise, you can try to customize the name of the output image file by using the variable country_name. Later, we’ll create a scenario and a Dataiku application so that others can choose the country to be plotted.

Create a Group recipe#

Recall that we want to be able to output a dataset of transactions for the most recent date, grouped by purchase hour, for a merchant URL. To do this, let’s create a Group recipe.

  1. Return to the Flow and select the ecommerce_transactions_with_ip_prepared dataset.

  2. Open the Actions right-side panel and select the Group recipe.

  3. Configure the New Group recipe to group by PurchaseHour, and name the output dataset recent_transactions_per_hour.

  4. Click Create Recipe.

    Groupe recipe creation
  5. Go to the recipe’s Pre-filter step and turn on the Filter.

  6. Keep only rows that satisfy a formula.

  7. Type the following formula in the formula editor.

    val('PurchaseDate') >= '${most_recent_date}' && contains(MerchantURL, '${merchant_url}')
    

    Note

    The incorrect syntax will result in an empty output dataset. For more about variables and which syntax to use with different code recipes, visit Custom variables expansion.

  8. Save and Run the recipe.

The output dataset displays all records where the PurchaseDate is greater than or equal to the most recent date (as defined in our variables), grouped by purchase hour.

With the MerchantURL set to the value of our merchant URL variable, we could now programmatically update which merchant URL to use when creating this output dataset.

Update variables using an application#

We want to allow other team members to run the Flow with the values they want without having to give them project administrator rights. It would also be nice if these users could work simultaneously and independently of each other. Now that our variables are defined and the Flow is modified, we can create a Dataiku application that will take care of these needs.

A Dataiku application provides a way for non-coders on your team to update variables via an interactive interface.

Create a scenario to build the scatter plot#

Our objective is to build an application that allows a team member to select a country such as United States, Spain, or Germany, to create a scatter plot for the selected country, and publish the scatter plot to a dashboard.

Our application will need a scenario it can run. Before we can start building our application, we’ll need to create the scenario.

  1. Visit the Jobs menu, then choose Scenarios.

    Screenshot of the scenario menu.
  2. Click + Create Your First Scenario.

  3. Name the scenario Build Scatter Plot then click Create.

  4. Go to the Steps tab, then click Add Step.

  5. Select Build/Train and name it 3D Scatter Plot.

  6. Click Add Folder to Build, select the order_total_3D_scatter_plot folder, then click Add.

  7. Set the Build mode to Build only this dataset.

  8. Save the scenario, then Run it to test it.

Scatter plot scenario

Build the application#

Now that we have created the scenario, we can start building our application that will allow a user to choose which country’s data to plot.

When you’ve completed this tutorial, you application will look similar to this:

Application to be expected when finishing the tutorial

To build the application:

  1. Visit the More Options menu, then choose Application designer.

  2. Click Convert into visual application. This will open the Application Designer.

    Convert application to visual.
  3. In the Included Content section, click +Add to add a dataset.

  4. Select the dataset, ecommerce_transactions_with_ip_prepared.

  5. Add a Managed folder and select the order_total_3d_scatter_plot folder.

  6. Save your changes.

    Application data and folders.

Now, we’ll need to give the user the ability to edit the variables.

  1. Click Add Section.

  2. Title the section View 3D Scatter Plot, and give it a description like, “Choose the country you want to filter the dataset on and view the scatter plot.”

    Application instance title input
  3. Click Add tile, then choose Edit project variables.

  4. Add the title, Select Country, and set the behavior to Open modal to edit.

  5. Replace the starter script in the Auto-generated controls with the following:

    [
       {
        "name": "country_name",
        "type": "SELECT",
        "label": "Desired Country",
        "mandatory": true,
        "canSelectForeign": false,
        "markCreatedAsBuilt": false,
        "allowDuplicates": true,
        "selectChoices": [
           {
              "value": "United States",
              "label": "United States",
              "showInColumnPreview": false,
              "selected": false
           },
           {
              "value": "Spain",
              "label": "Spain",
              "showInColumnPreview": false,
              "selected": false
           },
           {
              "value": "Germany",
              "label": "Germany",
              "showInColumnPreview": false,
              "selected": false
           }
        ],
        "getChoicesFromPython": false,
        "canCreateDataset": false
       }
    ]
    
    Edit page of project variables.
  6. Save your changes.

Note

This is just one way to allow the user to interact with the application. There are other types of interactions available. Visit Dataiku Applications: Edit Project Variables to find out more.

Next, we need to tell the application to run the existing scenario to produce the new scatter plot.

  1. Click Add Tile.

  2. Select Run scenario.

  3. Name it Build 3D Scatter Plot and choose the scenario Build Scatter Plot.

    Choosing the created 3D scatter scenario to add.

Finally, we’ll add a button that allows the user to view the scatter plot.

  1. Click Add Tile and select View folder.

  2. Name it View Scatter Plot and select the order_total_3d_scatter_plot folder.

  3. Save the application.

Test the application#

Let’s test our new application.

  1. Click the dropdown arrow next to the Test button in the top-right corner of the page.

  2. Select Create or update test instance (full).

Dataiku begins to create the test instance. After a short while, Dataiku displays the test instance. The title of the test instance depends on the title of your project.

To test the app, select a country for which you haven’t yet seen the scatter plot.

Note

If Dataiku displays an error when you try to view the scatter plot in the folder, check your scenario step to ensure the Build mode is set to Build only this dataset. Then, ensure your application includes the required dataset and folder in the Included Content.

What’s next?#

In a short period of time, we were able to refactor a project with hard-coded information using project variables that we could then programmatically update.

Specifically, we learned how to:

  • Define project-level variables.

  • Use project variables in a Python recipe.

  • Create a scenario enabling us to build a Dataiku application where users could choose the value of a variable.

Next you can continue your learning journey by completing other courses available in the Developer learning path!