Tutorial | Code recipes (Developer part 2)#

Code notebooks allow for free experimentation, but you’ll need code recipes to build outputs in your Flow.

Note

Before beginning this tutorial, you may wish to review a code recipe concept article.

Objectives#

In this tutorial, you will:

  • Create a code recipe from a notebook.

  • Sync code back and forth between a notebook and recipe.

  • Run a code recipe.

Starting here?

If you skipped the previous section and just want to focus on code recipes, you need to:

  1. Satisfy the prerequisites.

  2. Create the project (+ New Project > DSS tutorials > Developer > Code Recipes) or download and import the zip file from this website.

  3. Build the Flow containing data from the fictional Haiku T-Shirt company.

Note

This tutorial only covers the use of Python recipes in Dataiku, but the logic is similar for all code recipes. To learn more about using R recipes and, more broadly, R in Dataiku, follow this Academy course.

Create a recipe from a notebook#

Code recipes can be created from the Flow, but you can also create them from existing notebooks. This can be particularly useful for deploying exploratory work from notebooks to the Flow.

In this exercise, we will create a Python code recipe from the Jupyter notebook that was created in the Code Notebooks tutorial.

  1. Navigate to the Notebooks page and open the orders analysis Python notebook.

  2. Withing the notebook, in the top right bar, click on Create Recipe.

  3. Select Python recipe and click on OK to confirm.

    Dataiku screenshot of the dialog for creating a recipe from a notebook.
  4. Select the orders dataset as the input (since that is the dataset used in the notebook).

  5. Create a new output dataset and name it orders_by_customer.

  6. Click Create Recipe.

Edit code in a recipe#

In the resulting recipe, all the code from the Jupyter notebook has been transferred to the recipe code editor. Notice that Dataiku has added a number of commented out lines, each of which shows the beginning of a notebook cell. This way, if we need to edit the recipe in a notebook again, our existing cells are maintained.

The editor has also added two lines for the recipe output based on the name of the output dataset we created in the recipe dialog. We’ll discuss this below.

Dataiku screenshot of a Python recipe created from a notebook.

We now want to use the code from the notebook as a basis to create a simple group by function and group orders by customer, aggregating their past interactions. In the Basics courses, we accomplished this with a visual Group recipe, but it can also be easily accomplished with Python code.

Before adding the relevant code, let’s comment out the following lines, which served for displaying insights in the notebook, but are no longer useful when working with a recipe.

  1. Comment out the following lines of code:

    • df.head()

    • pdu.audit(df)

  2. In a new line right below df = dataset_orders.get_dataframe(), enter the following code:

    orders_by_customer_df = df.assign(total=df.tshirt_price*df.tshirt_quantity
        ).groupby(by="customer_id"
            ).agg({"pages_visited":"mean",
                "total":"sum"})
    
    Dataiku screenshot of a Python recipe with grouping function.

This creates a new dataframe with rows grouped by customer_id. For each customer, we’ve computed the average number of pages on the Haiku T-shirt website visited by the customer during orders, and the sum total of the value of orders made by the customer, where the value of each order is the price of each t-shirt multiplied by the number of t-shirts purchased.

Finally, Dataiku has added lines for the recipe output. However, it cannot know which dataframe (df or orders_by_customer_df) we want to output as the orders_by_customer dataset. Accordingly:

  1. In the last line of code, change pandas_dataframe to orders_by_customer_df.

  2. Click Validate to check the validity of the code. It should display Validation successful.

  3. Run the recipe.

Dataiku screenshot of a Python recipe having been run.

When it completes, explore the output dataset.

Notice that the output dataset does not contain the customer_id column, even though this was the key we grouped by. We’d like to have it for reference.

Dataiku screenshot of an output dataset from a Python recipe.

Edit the contents of a code recipe in a code notebook#

  1. To diagnose the issue, reopen the Python recipe (clicking Parent Recipe is one option).

  2. In the top right corner, click Edit in Notebook. This will open a Jupyter notebook with the recipe code where we can interactively test it.

    Tip

    Although it’s not the case in this tutorial, often when editing the contents of a code recipe in a Jupyter notebook, the code will appear in one single cell. You can use the Ctrl + Shift + - shortcut to split code into multiple cells.

  3. Uncomment the df.head() function and change df to orders_by_customer_df, so that the new function is as follows:

    orders_by_customer_df.head()
    
  4. Run the first three cells. The output shows that the orders_by_customer_df dataframe has the customer_id information; however, the dataframe has a hierarchical index.

    Dataiku screenshot of a Python notebook.
  5. In order to flatten the index, add .reset_index() to the code that defines the dataframe so that it looks like the following:

    orders_by_customer_df = df.assign(total=df.tshirt_price*df.tshirt_quantity
        ).groupby(by="customer_id"
            ).agg({"pages_visited":"mean",
                "total":"sum"}).reset_index()
    
  6. Re-run the third cell to see how the dataframe has changed.

    Dataiku screenshot of a Python notebook.
  7. In top right corner, click Save back to recipe.

  8. Comment out orders_by_customer_df.head(), validate, and run the recipe again.

Now the output dataset contains a customer_id column.

Dataiku screenshot of a dataset after a Python recipe.

What’s next?#

This tutorial used the builtin code environment, but often you’ll want to use your own. Learn about code environments in the next tutorial!

Note

To learn more, see the reference documentation on code recipes.