Concept | SQL code recipes#

Watch the video

An SQL recipe is useful for maintaining legacy code in a Dataiku Flow or for executing complex transformations that can’t be done in a single visual recipe, all while using an underlying database execution engine.

You can execute SQL recipes in Dataiku in two different ways — either as a Query or as a Script. Here is a summary of the different features defining these two options.

SQL Query

SQL Script

Can write output to any kind of dataset

Can only write to the database

Runs 100% in-database when output is in database

Always runs 100% in-database

DSS handles table creation/drop

You must handle table creation/drop

Automatic code validation

No code validation

Support most SQL constructs

Supports arbitrarily complex SQL scripts

Does not support CTE (WITH) expressions

Use for cases not covered by SQL query

SQL query#

To create an SQL query, specify the input and output datasets, and the storage location for the output dataset. This storage location can use a different database connection than the connection used by the input dataset.

../../_images/create-sql-recipe.png

Creating the query recipe opens up a code editor that contains a SELECT statement which you can edit to build your query. You can then Validate your code to check for syntax errors.

Before running the query, note that Dataiku will use the primary or most encompassing SELECT statement to create and insert the query results into an output table.

../../_images/sql-query-select-stmnt.png

When you Run the query, Dataiku writes this table into the storage location that you specified for the output dataset.

Dataiku handles the table creation or deletion, the insertion into the output table, and the automatic detection of the table schema. This means that an SQL query allows you to focus on writing the main query.

SQL script#

In the case of an SQL script, however, Dataiku doesn’t manage the input or output tables. This means that your code must include: DROP, CREATE, and INSERT statements, to ensure that your script is reproducible.

Furthermore, you must write the output of an SQL script to the same database where the input data resides.

Recommendation#

In general, try using the SQL query over the SQL script for the reasons just discussed. There are two exceptions to this recommendation:

  • When your SQL code has Common Table Expressions or WITH statements that can’t be re-written.

  • When you are working with a data type that’s not natively supported by Dataiku.

Next steps#

You know how to leverage SQL in Dataiku. For developers, it is possible to go beyond this: for practical exercices see Leveraging SQL in Python & R or Performing SQL, Hive and Impala queries

See also

For more information, see SQL recipes in the reference documentation.