Reference | Managing elastic AI compute capacity#

Dataiku Cloud manages the infrastructure of your instance and provides elastic AI computing capabilities that can be used on containerized execution. These capabilities depend on your subscription. Three dimensions (CPU, GB of RAM, and parallel activities) define these quotas. These dimensions act as limits and define the maximum concurrent usage of those resources.

The Usage & Monitoring panel in your Launchpad reflects your quota. It’s a common pool of resources shared by all users. This means that the capacities used by a task won’t be available for others until that task finishes. If a new task requests more resources than those left available (on either one of the three quotas — CPU, RAM, and parallel activities), that task is queued until the resources it requests have been freed.

When a user starts a job requesting containerized execution, it launches one or several containers. This withdraws from your quota the CPU and RAM it uses, as well as one parallel activity. The quotas used for the containers will be freed when the job finishes. You must close (unload) webapps and notebooks to free the resources they’re using.

Note

Jobs on partitioned datasets launch several containers and count as several parallel activities. Dataiku will process as many partitions as available parallel activities in your quota simultaneously in a dedicated container. You can limit the maximum number of parallel activities requested by the recipe in the Advanced tab.

Elastic AI Compute metrics#

To better understand how the Elastic AI Compute quota is used, an extension is available. To install it:

In the Cloud Launchpad, navigate to the Extensions panel.
Click + Add an Extension.
In Advanced features category, select Elastic AI Compute Metrics.
Click Add.

This extension will create within your Design node an S3 connection named elastic-ai-compute-metrics usable by the space_administrators group.

To import the metrics dataset in a new project:

Click on Connect or create.
In Add new dataset, click on Amazon S3.
In S3 connection, choose elastic-ai-compute-metrics.
Click on Test to retrieve the format.
Optionally, if you want to use daily partitioning:
- Click on Partitioning tab, then Activate Partitioning.
- Fill in Pattern with year=%Y/month=%_M/day=%_D/..
- Click on Add Time Dimension, choose date and DAY as Period.
- Click on List Partitions to validate the partitioning scheme.
In New dataset name, fill in the name of your choice, and then click Create.

This will create a dataset with a line every minute for every containerized execution (pod) running with the following columns:

Column	Description
timestamp
dss_version	The three digit DSS version (for example, `14.1.4`).
namespace	The K8s namespace where the execution is being done. Values can be: `space-xxxxxxxx-dku-compute` for Design and pre-production nodes `space-xxxxxxxx-dku-compute-automation` for Automation nodes `space-xxxxxxxx-dku-compute-gpu` for GPU
node_id	The node for which the pod is running. Values can be `design`, `automation`, or `pre-prod-automation`.
project_key	The project key in lowercase which launched the containerized execution (pod).
pod	The name of the pod when submitted from the Dataiku node to the Elastic AI Compute cluster.
requested_cpu	The CPU value of the containerized execution configuration. Leave empty in case of GPU type containerized execution configuration.
requested_ram	The RAM value of the containerized execution configuration expressed in byte. Leave empty in case of GPU type containerized execution configuration.
requested_gpu	The number of GPU in the containerized execution configuration. It’s only filled when using GPU type containerized execution configuration.
used_cpu	The real CPU usage of the container (average on the last 5 minutes). It can be empty when the pod is starting.
used_ram	The real RAM usage at the timestamp (not an average). It can be empty when the pod is starting.