Reference | Overview of compute engines on Dataiku Cloud#

On Dataiku Cloud, you can access different kinds of compute resources to execute jobs:

Run in-database (SQL)#

If you have connected a supported SQL database, you can push down computation in-database. Examples include Snowflake, Redshift, Google BigQuery, Microsoft SQL Server, and PostgreSQL. When using Dataiku Cloud, in-database should be your preferred compute engine for all eligible tasks.

Note

As always, input and output datasets both need to be in the SQL database. Also note that this compute engine is only available for SQL-type jobs. That includes SQL code, most visual recipes, as well as all Prepare recipes made of SQL-translatable processors.

Containerized execution on fully managed elastic AI compute#

Dataiku Cloud includes fully managed Elastic AI compute capacities based on Kubernetes (k8s) to execute various workloads (containerized execution).

Try using containerized execution for all tasks where in-database compute is impossible, in particular Python notebooks and Spark jobs. More details on this compute engine are in this section.

Local execution#

You can execute jobs locally using the same resources as the Dataiku app itself. The execution is done locally every time you select DSS - Local Stream or Use backend to execute.

By concept, this type of execution can alter the performance of the application. Using in-memory processing isn’t recommended when you can leverage a database.