Tutorial | SQL notebooks (SQL part 6)

In-database charts are one way to efficiently visualize a database dataset. In other cases, you’ll want the freedom of a SQL notebook for prototyping code and analyzing data.

Note

Before beginning this tutorial, you may wish to review the concept article on SQL notebooks.

Objectives

In this tutorial, you will:

  • create a SQL notebook and use it to run SQL queries.

Starting here?

This tutorial requires having the output dataset from part 4 in order to reproduce the steps here.

Create a SQL notebook

  • From the customers_enriched dataset, select Lab in the Actions tab of the right panel.

  • In the Code Notebooks section, click New.

  • Select SQL and click Create.

  • Click +Query to initiate the first query of the notebook.

  • In the Tables tab of the left panel, click on the plus icon to the right of the table name to paste a starter query for this table.

The query form is now populated with the following code, which selects all columns from the DKU_TUTORIAL_SQL_customers_enriched table in the dku_shirt schema.

SELECT * FROM "dku_tshirt"."DKU_TUTORIAL_SQL_customers_enriched"

Tip

As before, your own query may not have the dku_shirt schema depending on the configuration of your database.

If you click Run, the query executes and shows you a sample of the query results.

Sample of the query results.

Run more SQL queries in a notebook

We are interested in how Haiku T-Shirt customers respond to the campaign, and in particular across different age groups. We can quickly run some SQL queries to analyze this.

  • Click +Query to start a new query.

  • Copy the following code to the new query cell, and click Run.

SELECT campaign, AVG(total), SUM(total), COUNT(campaign)
    FROM "dku_tshirt"."DKU_TUTORIAL_SQL_customers_enriched"
    WHERE birthdate >= '1980-01-01' AND birthdate < '1990-01-01'
    GROUP BY campaign

The query pulls customers born in the 1980s and groups them by whether they are part of the campaign. The columns report:

  • campaign, since this is the GROUP BY variable, the two rows represent the two values indicating whether a customer is part of the campaign

  • avg, representing the average order placed by a customer. Customers who were part of the campaign placed orders averaging 97.40 in value, while those who were not placed orders averaging 34.49

  • sum, representing the total value of orders placed by customers in this group

  • count, representing the number of customers in this group

Sample of the query results.

Let’s try another.

  • Click +Query.

  • Copy the following code to the new query cell, and click Run.

SELECT campaign, AVG(total), SUM(total), COUNT(campaign)
    FROM "dku_tshirt"."DKU_TUTORIAL_SQL_customers_enriched"
    WHERE birthdate >= '1990-01-01' AND birthdate < '2000-01-01'
    GROUP BY campaign

The query results show summaries for customers born in the 1990’s. What is of immediate interest here is that there is a much smaller gap between customers who are and are not part of the campaign, versus customers born in the 80’s. This suggests that there might exist some interaction between age and the effect of the campaign.

Sample of the query results.

Note

All of these queries are part of a notebook that exists only in the Lab. To use a query from a notebook in the Flow, you can turn the query into a SQL code recipe by clicking Create Recipe.

What’s next?

Congratulations! If you’ve completed the entire series of tutorials, you saw how to:

  • create a SQL connection with Dataiku;

  • move data to a database using a Sync or Prepare recipe;

  • execute in-database operations either with visual recipes or SQL recipes;

  • explore datasets with in-database charts or notebooks.

Note

You can learn more about SQL notebooks in the reference documentation.