Concept | Computation engines

See the video version of this article


Computation has a cost. In Dataiku, there are computation strategies that help with reducing this cost. Dataiku can perform the computation using the DSS engine or push down the computation to external engines. Dataiku acts as an orchestrator of your connections’ engines, delegating the computation to these connections when possible.

../../_images/dss-computation-delegation.png

When transforming your dataset, you are actually working with a sample of it. When you are ready to apply transformation steps to the whole dataset, click Run. Dataiku has selected a computation engine that best fits the underlying data storage and the operation you are applying to the dataset.

../../_images/computation-engine-options.png

Recipe engines

Computation in Dataiku can take four main forms. To avoid running out of memory when manipulating large datasets, Dataiku recommends offloading the computation to where the data lives. This way you avoid bringing all the data into memory or streaming it through the Dataiku server.

In DSS Engine: in-memory or streamed

For in-memory execution, the data are stored in RAM. This strategy is used, for example, to execute Python or R recipes.

For streamed execution, Dataiku reads the input dataset as a stream of rows, applies computation to the rows as they arrive, and writes the output datasets row per row.

In-database

This strategy is used, for example, to execute SQL queries. The visual recipe is translated into an SQL query, which is then injected, or pushed down, to the SQL server.

On Hadoop/Spark cluster

Depending on the engine you choose, the visual recipe is translated into a Hive, Impala, or Spark SQL query, which is then injected, or pushed down, to the Hadoop or Spark cluster.

In Kubernetes/Docker

This strategy is also in-memory or streamed but uses container execution through Docker and Kubernetes clusters rather than the Dataiku host server.

Learn more

In this article, you learned about ways to optimize computation when working in Dataiku. Continue getting to know the basics of Dataiku by learning about jobs.