Tutorial | SQL notebooks and recipes#

Get started#

Although many visual recipes can be run in-database, at times you may want the flexibility of writing your own SQL queries. You can prototype such queries in a SQL notebook, and then use them in the Flow as SQL recipes.

Objectives#

In this tutorial, you will:

  • Query a SQL database from a notebook.

  • Run a SQL query in-database via a SQL query recipe.

Prerequisites#

Create the project#

  1. From the Dataiku Design homepage, click + New Project > DSS tutorials > Core Designer > SQL Notebooks and Recipes.

  2. From the project homepage, click Go to Flow.

Note

You can also download the starter project from this website and import it as a zip file.

Change connections#

The datasets in the Flow are not yet synced to your specific SQL database. Let’s fix that first.

  1. From the Flow, multi-select all datasets computed by Dataiku (all except the two leftmost uploaded files).

  2. Near the bottom of the Actions sidebar, select Change connection.

  3. For the new connection field, select an available SQL connection.

  4. Click Save.

Note

For more information, see Concept | Connection changes.

You’ll next want to build the Flow.

  1. Click Flow Actions at the bottom right of the Flow.

  2. Click Build all.

  3. Keep the default settings and click Build.

Use case summary#

The Flow begins from two uploaded CSV files:

Dataset

Description

orders

Order information for t-shirts, including the order’s date, the type of t-shirt, and the customer who placed the order.

customers_stacked

Additional information about the customers who have placed a t-shirt order.

Use a SQL notebook#

In-database charts are one way to efficiently visualize a database dataset. In other cases, you’ll want the freedom of a SQL notebook for prototyping code and analyzing data.

Create a SQL notebook#

Let’s start by creating a SQL notebook.

  1. From the Flow, select the orders_copy dataset.

  2. In the right sidebar, navigate to the Lab.

  3. Under Code Notebooks, select New.

  4. Select SQL as the language.

  5. Click Create.

Dataiku screenshot of the dialog for creating a SQL notebook.

Important

One SQL notebook can only be attached to one SQL connection.

Run the default SQL query#

It’s helpful to start with a basic query to make sure we are querying the correct table.

  1. Click + Query to initiate the first query cell of the notebook.

  2. Select the Tables tab on the left to see available tables.

  3. Click the plus icon to the right of the orders_copy table name to paste a starter query into the editor.

  4. Click Run near the top right (or use the keyboard shortcut cmd/ctrl + enter) to run the query and observe the results.

Dataiku screenshot of a SQL query in a SQL notebook.

Important

Your exact SQL query will appear differently depending on whether you are using Dataiku Cloud or a self-managed instance, the schema of your SQL database, and your project key.

Write a SQL query#

Now that we have identified the correct table name, let’s demonstrate a simple query grouping orders by t-shirt categories.

  1. Copy-paste the following SQL query into the existing cell, adjusting the FROM clause for your instance and table name.

    SELECT "tshirt_category", COUNT(*) AS "count"
    FROM "{INSTANCE_ID}"."DKU_TUTORIAL_SQL_ORDERS_COPY"
    GROUP BY "tshirt_category"
    
  2. Above the query, click to name it order count by tshirt category.

  3. Click Run near the top right to run the query and observe the results.

Dataiku screenshot of a SQL query in a SQL notebook.

Important

In addition to query cells, a SQL notebook can also have markdown cells (+ Markdown), which can be useful for providing commentary or discussion around SQL queries.

Use a SQL query recipe#

Queries in a notebook exist only in the Lab. They are not part of the Flow. To create an output dataset from a SQL query, we’ll need to turn the query into a SQL code recipe.

Create a SQL query recipe#

Once we’re satisfied prototyping a query, we can turn it into a recipe to create an output in the Flow.

  1. Above the order count by tshirt category query, click Create Recipe.

  2. In the Output column, click Set, and name the output dataset orders_by_tshirt_category.

  3. Click Create Dataset.

  4. Click Create Recipe.

Dataiku screenshot of the dialog for a SQL query recipe.

Run a SQL query recipe#

Now we have the same query, but in a recipe format with a defined output. This recipe will start a job that creates the output table in the database and fills it with the results of the query.

  1. To make the query more readable, click Format Code near the bottom.

  2. Although not required here, click Validate to have Dataiku parse the query, verify that the syntax is correct, and fetch the output dataset column names and types.

  3. Click Run to build the output in the Flow, and then open the output dataset when finished.

Dataiku screenshot of a SQL query recipe.

Important

At any time, we can delete this recipe from the Flow, and the original query remains in the notebook.

Iterate between a SQL notebook and a recipe#

Like with Python and R notebooks and recipes, we are able to iterate between experimental code in a SQL notebook and production code in a SQL recipe.

  1. Return to the SQL recipe found in the Flow.

  2. Click Edit in Notebook near the top right.

  3. Now in the notebook, in the third line, change the name of the count column to tshirt_count.

  4. Click Run again to be sure the query still works.

  5. Click Save Back to Recipe to sync your changes back to the Flow.

Dataiku screenshot of a SQL notebook.

Tip

As a matter of comparison, try reproducing the same output visually by creating a Group recipe on orders_copy dataset with tshirt_category as the group key and the default count as the only aggregation. The results should be the same.

What’s next?#

Congratulations! You’ve experimented writing SQL queries in a notebook and saw how to deploy them to the Flow with a SQL recipe.

See also

This was an example of a SQL query recipe. For more complex queries, such as those including CTE expressions, you may need a SQL script recipe. For more on the differences between SQL query and SQL script recipes, see the reference documentation.

To learn more about coding in Dataiku with other languages such as Python and R, see the Developer learning path or the Developer Guide.

Tip

You can find this content (and more) by registering for the Dataiku Academy course, Dataiku & SQL. When ready, challenge yourself to earn a certification!