Reference | Overview of compute engines on Dataiku Cloud#
On Dataiku Cloud, you can access different kinds of compute resources to execute jobs:
Run in-database (SQL)#
Among supported databases, if you have connected a SQL database (such as Snowflake, Redshift, Google BigQuery, Microsoft SQL Server, PostgreSQL), you can push down computation in-database. When using Dataiku Cloud, in-database should be your preferred compute engine for all eligible tasks.
As always, input and output datasets both need to be in the SQL database. Also note that this compute engine is only available for SQL-type jobs. That includes SQL code, most visual recipes, as well as all Prepare recipes made of SQL-translatable processors.
Containerized execution on fully managed elastic AI compute#
Dataiku Cloud includes fully-managed Elastic AI compute capacities based on Kubernetes (k8s) to execute various workloads (containerized execution).
We recommend using containerized execution for all tasks where in-database compute is not possible, in particular Python notebooks and Spark jobs. More details on this compute engine are in this section.
Jobs can also be executed locally using the same resources as the Dataiku application itself. The execution is done locally every time that “DSS - Local Stream” or “Use backend to execute” are selected.
By concept, this type of execution can alter the performance of the application. Using in-memory processing is not recommended when you can leverage a database.
See Tutorial | Recipe engines to learn more about how Dataiku selects the optimal engine to build your Flow.