Tutorial | SQL notebooks and recipes#
Get started#
Although many visual recipes can be run in-database, at times you may want the flexibility of writing your own SQL queries. You can prototype such queries in an SQL notebook, and then use them in the Flow as SQL recipes.
Objectives#
In this tutorial, you will:
Query an SQL database from a notebook.
Run an SQL query in-database via an SQL query recipe.
Prerequisites#
Dataiku 12.0 or later.
An available SQL connection. If you don’t have one yet, see Tutorial | Configure a connection between Dataiku and an SQL database.
You may also wish to review Concept | SQL notebooks and Concept | SQL code recipes.
Create the project#
From the Dataiku Design homepage, click + New Project.
Select Learning projects.
Search for and select SQL Notebooks and Recipes.
Click Install.
From the project homepage, click Go to Flow (or
g
+f
).
From the Dataiku Design homepage, click + New Project.
Select DSS tutorials.
Filter by Core Designer.
Select SQL Notebooks and Recipes.
From the project homepage, click Go to Flow (or
g
+f
).
Note
You can also download the starter project from this website and import it as a zip file.
Change connections#
The datasets in the Flow are not yet synced to your specific SQL database. Let’s fix that first.
From the Flow, multi-select all datasets computed by Dataiku (all except the two leftmost uploaded files).
Near the bottom of the Actions sidebar, select Change connection.
For the new connection field, select an available SQL connection.
Click Save.
Note
For more information, see Concept | Connection changes.
You’ll next want to build the Flow.
Click Flow Actions at the bottom right of the Flow.
Click Build all.
Keep the default settings and click Build.
Use case summary#
The Flow begins from two uploaded CSV files:
Dataset |
Description |
---|---|
orders |
Order information for t-shirts, including the order’s date, the type of t-shirt, and the customer who placed the order. |
customers_stacked |
Additional information about the customers who have placed a t-shirt order. |
Use an SQL notebook#
In-database charts are one way to efficiently visualize a database dataset. In other cases, you’ll want the freedom of an SQL notebook for prototyping code and analyzing data.
Create an SQL notebook#
Let’s start by creating an SQL notebook.
From the Flow, select the orders_copy dataset.
In the right sidebar, navigate to the Lab.
Under Code Notebooks, select New.
Select SQL as the language.
Click Create.
Important
One SQL notebook can only be attached to one SQL connection.
Run the default SQL query#
It’s helpful to start with a basic query to make sure we are querying the correct table.
Click + Query to initiate the first query cell of the notebook.
Select the Tables tab on the left to see available tables.
Click the plus icon to the right of the orders_copy table name to paste a starter query into the editor.
Click Run near the top right (or use the keyboard shortcut
cmd/ctrl
+enter
) to run the query and observe the results.
Important
Your exact SQL query will appear differently depending on whether you are using Dataiku Cloud or a self-managed instance, the schema of your SQL database, and your project key.
Write an SQL query#
Now that we have identified the correct table name, let’s demonstrate a simple query grouping orders by t-shirt categories.
Copy-paste the following SQL query into the existing cell, adjusting the FROM clause for your instance and table name.
SELECT "tshirt_category", COUNT(*) AS "count" FROM "{INSTANCE_ID}"."DKU_TUTORIAL_SQL_ORDERS_COPY" GROUP BY "tshirt_category"
Above the query, click to name it
order count by tshirt category
.Click Run near the top right to run the query and observe the results.
Important
In addition to query cells, an SQL notebook can also have markdown cells (+ Markdown), which can be useful for providing commentary or discussion around SQL queries.
Use an SQL query recipe#
Queries in a notebook exist only in the Lab. They are not part of the Flow. To create an output dataset from an SQL query, we’ll need to turn the query into an SQL code recipe.
Create an SQL query recipe#
Once we’re satisfied prototyping a query, we can turn it into a recipe to create an output in the Flow.
Above the order count by tshirt category query, click Create Recipe.
In the Output column, click Set, and name the output dataset
orders_by_tshirt_category
.Click Create Dataset.
Click Create Recipe.
Run an SQL query recipe#
Now we have the same query, but in a recipe format with a defined output. This recipe will start a job that creates the output table in the database and fills it with the results of the query.
To make the query more readable, click Format Code near the bottom.
Although not required here, click Validate to have Dataiku parse the query, verify that the syntax is correct, and fetch the output dataset column names and types.
Click Run to build the output in the Flow, and then open the output dataset when finished.
Important
At any time, we can delete this recipe from the Flow, and the original query remains in the notebook.
Iterate between an SQL notebook and a recipe#
Like with Python and R notebooks and recipes, we are able to iterate between experimental code in an SQL notebook and production code in an SQL recipe.
Return to the SQL recipe found in the Flow.
Click Edit in Notebook near the top right.
Now in the notebook, in the third line, change the name of the count column to
tshirt_count
.Click Run again to be sure the query still works.
Click Save Back to Recipe to sync your changes back to the Flow.
Tip
As a matter of comparison, try reproducing the same output visually by creating a Group recipe on orders_copy dataset with tshirt_category as the group key and the default count as the only aggregation. The results should be the same.
What’s next?#
Congratulations! You’ve experimented writing SQL queries in a notebook and saw how to deploy them to the Flow with an SQL recipe.
See also
This was an example of an SQL query recipe. For more complex queries, such as those including CTE expressions, you may need an SQL script recipe. For more on the differences between SQL query and SQL script recipes, see the reference documentation.
To learn more about coding in Dataiku with other languages such as Python and R, see the Developer learning path or the Developer Guide.
Tip
You can find this content (and more) by registering for the Dataiku Academy course, Dataiku & SQL. When ready, challenge yourself to earn a certification!