Concept Summary: SQL Recipe

In the previous video, you learned how to execute custom SQL code in a Dataiku DSS Flow using the SQL recipe. Let’s now summarize the key points of the video before continuing on to the next lesson.

An SQL recipe is useful for maintaining legacy code in a DSS Flow or for executing complex transformations that cannot be done in a single visual recipe, all while using an underlying database execution engine.

SQL recipes in DSS are executed in two different ways — either as a Query or as a Script.

../../../_images/sql-recipe-choices.png

SQL Query

To create an SQL query, specify the input and output datasets, and the storage location for the output dataset. This storage location can use a different database connection than the connection used by the input dataset.

../../../_images/create-sql-recipe.png

Creating the query recipe opens up a code editor that contains a SELECT statement which you can edit to build your query. You can then Validate your code to check for syntax errors.

Before running the query, note that Dataiku DSS will use the primary or most encompassing SELECT statement to create and insert the query results into an output table.

../../../_images/sql-query-select-stmnt.png

When you Run the query, DSS writes this table into the storage location that you specified for the output dataset.

Because DSS handles the table creation or deletion, insertion into the output table, and the automatic detection of the table schema, an SQL query allows you to focus on writing the main query.

SQL Script

In the case of an SQL Script, however, DSS does not manage the input or output tables. This means that your code must include: DROP, CREATE, and INSERT statements, to ensure that your script is reproducible.

Furthermore, The output of an SQL script must be written to the same database where the input data resides.

Recommendation

In general, we recommend that you use the SQL Query over the SQL Script, for the reasons just discussed. There are two exceptions to this recommendation:

  • When your SQL code has Common Table Expressions or WITH statements that cannot be re-written

  • When you are working with a data type that is not natively supported by DSS.

For more information, see SQL recipes in the product documentation.