Hands-On Tutorial: Using Visual Recipes to Perform In-database Operations

Many visual recipes (Group, Join, Stack, etc.) can also perform in-database operations. For example, we are now ready to enrich the customers_stacked_prepared dataset with information about the orders that customers have made.

  • Click on customers_stacked_prepared, and from the Actions menu choose Join with… from the list of visual recipes.

  • Select orders_by_customer as the second input dataset.

  • Change the name of the output dataset to customers_enriched.

  • Note that the output dataset will be stored in the same database.

  • Click Create Recipe.

From the Basics 103 course, you will be familiar with the mechanics of the Join recipe, but this time we can execute it in-database.

  • In the Join step, change the join type to an Inner join in order to return only customers who have made an order. Note that Dataiku DSS has automatically determined the join keys customerID and customer_id, even though they have different names in the two datasets.

  • In the Selected columns step, we can deselect customer_id from the orders_by_customer dataset because it is redundant with customerID in the customers_stacked_prepared dataset.

  • Before running the recipe, check the Output step. Click View Query to preview the SQL code.

  • Before running the recipe, click the gear icon underneath the Run button to confirm that the recipe will be executed using the in-database engine.

  • We’re satisfied with this, so click Run and update the schema.

At this point, your Flow should resemble the image below.

../../../_images/sql-dss-flow-after-join.png