Tutorial | SQL notebooks and recipes#

Get started#

Although you can run many visual recipes in-database, at times you may want the flexibility of writing your own SQL queries. You can prototype such queries in an SQL notebook, and then use them in the Flow as SQL recipes.

Objectives#

In this tutorial, you will:

Query an SQL database from a notebook.
Run an SQL query in-database via an SQL query recipe.

Prerequisites#

Dataiku 12.0 or later.
An available SQL connection. If you don’t have one yet, see Tutorial | Configure a connection between Dataiku and an SQL database.
You may also wish to review Concept | SQL notebooks and Concept | SQL code recipes.

Create the project#

From the Dataiku Design homepage, click + New Project.
Select Learning projects.
Search for and select SQL Notebooks and Recipes.
If needed, change the folder into which the project will be installed, and click Install.
From the project homepage, click Go to Flow (or type g + f).

From the Dataiku Design homepage, click + New Project.
Select DSS tutorials.
Filter by Core Designer.
Select SQL Notebooks and Recipes.
From the project homepage, click Go to Flow (or type g + f).

Note

You can also download the starter project from this website and import it as a zip file.

Change connections#

The datasets in the Flow aren’t yet synced to your specific SQL database. Let’s fix that first.

From the Flow, multi-select all datasets computed by Dataiku (all except the two leftmost uploaded files).
Near the bottom of the right panel’s Actions tab, select Change connection.
For the new connection field, select an available SQL connection.
Click Save.

Note

For more information, see Concept | Connection changes.

You’ll next want to build the Flow.

Click Flow Actions at the bottom right of the Flow.
Click Build all.
Keep the default settings and click Build.

Use case summary#

The Flow begins from two uploaded CSV files:

Dataset	Description
orders	Order information for t-shirts, including the order’s date, the type of t-shirt, and the customer who placed the order.
customers_stacked	Additional information about the customers who have placed a t-shirt order.

Use an SQL notebook#

In-database charts are one way to efficiently visualize a database dataset. In other cases, you’ll want the freedom of an SQL notebook for prototyping code and analyzing data.

Create an SQL notebook#

Let’s start by creating an SQL notebook.

From the Flow, select the orders_copy dataset.
Navigate to the Lab () tab of the right panel.
Under Code Notebooks, select New.
Select SQL as the language.
Click Create.

Important

You can attach one SQL notebook to only one SQL connection.

Run the default SQL query#

It’s helpful to start with a basic query to make sure you’re querying the correct table.

Click + Query to initiate the first query cell of the notebook.
Select the Tables tab on the left to see available tables.
Hover over the orders_copy table, and click the plus () icon to paste a starter query into the editor.
Click Run (or use the keyboard shortcut Cmd/Ctrl + enter) to run the query and observe the results.

Important

Your exact SQL query will differ based on factors such as:

Whether you are using Dataiku Cloud or a self-managed instance.
The schema of your SQL database.
Your project key.

Write an SQL query#

Now that you’ve identified the correct table name, let’s demonstrate a simple query grouping orders by t-shirt categories.

Copy-paste the following SQL query into the existing cell, adjusting the FROM clause for your instance and table name.

SELECT "tshirt_category", COUNT(*) AS "count"
FROM "{INSTANCE_ID}"."DKU_TUTORIAL_SQL_ORDERS_COPY"
GROUP BY "tshirt_category"

Above the query, click to name it order count by tshirt category.
Click Run to execute the query and observe the results.

Tip

In addition to query cells, an SQL notebook can also have markdown cells (+ Markdown), which can be useful for providing documenting SQL queries.

Use an SQL query recipe#

Queries in a notebook exist only in the Lab. They’re not part of the Flow. To create an output dataset from an SQL query, you’ll need to turn the query into an SQL code recipe.

Create an SQL query recipe#

Once you’re satisfied prototyping a query, you can turn it into a recipe to create an output in the Flow.

From within the SQL notebook, click + Create Recipe.
In the Output column, click Set, and name the output dataset orders_by_tshirt_category.
Click Create Dataset.
Click Create Recipe.

Run an SQL query recipe#

Now you have the same query, but in a recipe format with a defined output. This recipe will start a job that creates the output table in the database and fills it with the results of the query.

To make the query more readable, click Format Code near the bottom.
Although not required here, click Validate to have Dataiku parse the query, verify that the syntax is correct, and fetch the output dataset column names and types.
Click Run to build the output in the Flow.
Open the output dataset orders_by_tshirt_category when finished.

Important

At any time, you can delete this recipe from the Flow, and the original query remains in the notebook.

Iterate between an SQL notebook and a recipe#

Like with Python and R notebooks and recipes, you’re able to iterate between experimental code in an SQL notebook and production code in an SQL recipe.

Click Parent Recipe to return to the SQL recipe.
Click Edit in Notebook.
Now in the notebook, in the third line, change the name of the count column to tshirt_count.
Click Run again to be sure the query still works.
Click Save Back to Recipe to sync your changes back to the Flow.

Tip

As a matter of comparison, try reproducing the same output visually. Create a Group recipe on orders_copy dataset. Set tshirt_category as the group key and the default count as the only aggregation.

Next steps#

Congratulations! You’ve experimented writing SQL queries in a notebook and saw how to deploy them to the Flow with an SQL recipe.