Hands-On Tutorial: SQL Notebooks

Note

This lesson is part of a series of “Usage of SQL and Dataiku tutorials” that begins with the Hands-On Tutorial: Sync Recipe.

Tip

This content is also included in a free Dataiku Academy course Dataiku & SQL, which is an optional part of the Core Designer learning path. Register for the course there if you prefer to track and validate your progress alongside concept videos, text summaries, hands-on tutorials, and quizzes.

SQL notebooks are useful for prototyping code and querying and analyzing data.

  • From the customers_enriched dataset, select Lab in the Actions tab of the right panel.

  • In the Code Notebooks section, click New.

  • Select SQL and click Create.

  • Click +Query to initiate the first query of the notebook.

  • In the Tables tab of the left panel, click on the plus icon to the right of the table name to paste a starter query for this table.

The query form is now populated with the following code, which selects all columns from the DKU_TUTORIAL_SQL_customers_enriched table in the dku_shirt schema.

SELECT * FROM "dku_tshirt"."DKU_TUTORIAL_SQL_customers_enriched"

Tip

As before, your own query may not have the dku_shirt schema depending on the configuration of your database.

If you click Run, the query executes and shows you a sample of the query results.

Sample of the query results.

We are interested in how Haiku T-Shirt customers respond to the campaign, and in particular across different age groups. We can quickly run some SQL queries to analyze this.

  • Click +Query to start a new query.

  • Copy the following code to the new query cell, and click Run.

SELECT campaign, AVG(total), SUM(total), COUNT(campaign)
    FROM "dku_tshirt"."DKU_TUTORIAL_SQL_customers_enriched"
    WHERE birthdate >= '1980-01-01' AND birthdate < '1990-01-01'
    GROUP BY campaign

The query pulls customers born in the 1980’s and groups them by whether they are part of the campaign. The columns report:

  • campaign, since this is the GROUP BY variable, the two rows represent the two values indicating whether a customer is part of the campaign

  • avg, representing the average order placed by a customer. Customers who were part of the campaign placed orders averaging 97.40 in value, while those who were not placed orders averaging 34.49

  • sum, representing the total value of orders placed by customers in this group

  • count, representing the number of customers in this group


Sample of the query results.

Let’s try another.

  • Click +Query.

  • Copy the following code to the new query cell, and click Run.

SELECT campaign, AVG(total), SUM(total), COUNT(campaign)
    FROM "dku_tshirt"."DKU_TUTORIAL_SQL_customers_enriched"
    WHERE birthdate >= '1990-01-01' AND birthdate < '2000-01-01'
    GROUP BY campaign

The query results show summaries for customers born in the 1990’s. What is of immediate interest here is that there is a much smaller gap between customers who are and are not part of the campaign, versus customers born in the 80’s. This suggests that there might exist some interaction between age and the effect of the campaign.

Sample of the query results.

Note

All of these queries are part of a notebook that exists only in the Lab. To use a query from a notebook in the Flow, you can turn the query into a SQL code recipe by clicking Create Recipe.