Instance Templates

In this section of resources, learn about modifying the settings in your instance templates, while avoiding the pitfalls of doing so.

Note

You can also find resources on setup actions used to configure instances.

Reference | Fleet blueprints

To quickly deploy pre-configured Dataiku instances, or complete fleets, you can use a Fleet blueprint. A Fleet blueprint uses pre-configured instance templates and virtual networks.

  • Instance templates tell Fleet Manager how to deploy the Dataiku DSS (Dataiku) instances linked to the template.

  • Virtual networks tell Fleet Manager where to deploy your Dataiku instances.

You can modify your instance templates and virtual networks and create new ones. Instance templates are not tied to a specific virtual network. Fleet Manager lets you know if the modifications you make to the properties of an instance template or virtual network require reprovisioning your Dataiku instances to take effect.

Note

Before making any modifications, there are essential considerations including the impact of upgrading or reprovisioning an instance.

Tip | Modifying instance templates and settings

Fleet Manager provides a single user interface for managing your Dataiku instances and modifying instance settings. While this allows for flexibility, the modifications you make can significantly impact your Dataiku users or even cause unwanted results.

In general, you can modify any instance setting that does not impact the operation of Dataiku. You can make modifications to your Dataiku instances by modifying the instance templates, virtual networks, or the instance settings themselves.

Note

Be aware of how such modifications impact disk sizes and other elements, such as SaaS model, Dataiku instance lifecycle, monitoring tools, and Dataiku releases and security patches.

Tip | The impact of instance template modifications on disk sizes

DSS instances are based on a data disk and an Operating System (OS) disk. The data disk contains everything stateful relevant for DSS to run. This is why Fleet Manager only snapshots the data disk. The data disk is the only thing that matters when provisioning or reprovisioning an instance because the OS disk is always replaced at provisioning time.

Caution

You should avoid storing anything outside the data disk because when you upgrade or reprovision an instance, everything stored outside the data disk is lost.

Data disk

The data disk contains all the DSS configuration and its data files. Fleet Manager uses Elastic Block Storage (EBS) volumes as the storage layer for the data disk.

It’s possible to set a starting size for the data disk and the maximum size the disk is allowed to reach. The Fleet Manager agent in the DSS instance will automatically grow the disk whenever the space occupied reaches 80% until it reaches the maximum allowed size.

Even though it’s not best practice to store data in local filesystem connections, sometimes it’s convenient for small datasets or lookups. Furthermore, DSS will need a reasonably sized data disk to store logs, code environments, and anything else that cannot be offloaded to cloud storage.

OS disk

The OS disk is where the OS and other binaries are installed. The OS disk can be considered as temporary because it is replaced every time the instance is reprovisioned. However, a good reason to have a reasonably sized OS disk (20GB to 50GB) is because Python and R packages, along with ML models, might use the OS’ default temp folder location to store temporary files. There are ways to alter this behavior, but unfortunately, not all packages/tools abide by the same conventions.

Tip | The impact of instance template modifications on other elements

SaaS model

The deployment model offered by Fleet Manager is most similar to a SaaS model, where Dataiku is used as a service deployed by its management tool using settings and assets that have been configured earlier by Dataiku.

Dataiku instance lifecycle

A Dataiku instance is destroyed and reprovisioned many times during its lifecycle. Instances are temporary, and only the data disk is kept when reprovisioning or upgrading.

Monitoring tools

Since Dataiku instances are temporary, it is a recommended best practice to minimize customizations or installation of monitoring tools. It is okay to install lightweight agents such as those that acknowledge the Dataiku instance in your organization’s network. You can do this by running ansible tasks in the Setup actions in your instance template.

Dataiku releases and security patches

Fleet Manager follows the same release cycle as Dataiku DSS (Dataiku). The image template (such as the AMI or Azure image template) is updated at every Dataiku release. The image template is configured with the best settings for Dataiku and the latest security patches available at the time of creation.

How-to | Create or modify an instance template

A Dataiku instance is always launched from an instance template. The Dataiku instance(s) you deploy from the same instance template have common properties. To modify these common properties, modify the instance template settings and then reprovision each instance linked to the template.

To view an instance template:

  • Launch Fleet Manager.

  • Under Settings, choose Instance templates.

You can create a new template or choose to modify an existing template.

../../_images/instance-template-new-aws.png
../../_images/instance-template-new-azure.png

Note

See the section on setup actions for instance templates to learn about configuring an instance.

How-to | Grant SSH access

You can define the name of the AWS key pair to allow SSH access. Since Fleet Manager does not support multiple AWS accounts, this key pair must be defined in the same AWS account used to set up Fleet Manager.

To grant SSH access:

  • Navigate to AWS security > SSH access.

  • In AWS key pair name, provide a public SSH key.

You can add a public key to allow SSH access to Dataiku. Since Fleet Manager does not support multiple Azure accounts, this public key must be defined in the same Azure account used to set up Fleet Manager.

Note

You can authorize additional SSH keys using setup actions.

To grant SSH access:

  • Navigate to Azure security > SSH access.

  • In SSH Key, provide a public SSH key.

This grants SSH access using the “centos” user which can run sudo commands. To authorize additional SSH keys, use the Setup action, “Add authorized SSH key.”

How-to | Grant security roles

There are two ways to grant your Dataiku instances access to AWS services:

  • IAM roles

  • AWS access key

It is a recommended best practice to use an IAM role and its instance profile rather than using an AWS access key. This IAM instance profile is auto-created when you create a role from the AWS console.

You can assign IAM roles to each instance linked to the instance template. The benefits of assigning IAM roles are:

  • Avoid unnecessary sharing of long-term access keys

  • Simpler to maintain than access keys

It is possible to have one role assigned at startup (before the Dataiku instance starts up) and another one at runtime (after the Dataiku instance starts up). This helps to limit the scope of the managed identity while the instance is running.

Instance IAM Role

To assign Instance IAM roles:

  • Navigate to AWS security > Instance IAM role.

  • In Runtime instance profile ARN, provide the ARN (not a role ARN).

  • In Startup instance profile ARN, provide the ARN (not a role ARN).

  • Select the Restrict metadata access checkbox to prevent end-user processes from accessing the AWS metadata server.

    • This ensures the Dataiku end users cannot assume the instance role.

Keypair (Access Key)

You can use an AWS access key to access Dataiku.

If you prefer to use an AWS access key to access Dataiku (rather than using an IAM role), you’ll need to provide your ASM secret ID so that Fleet Manager can retrieve the secret access key from AWS Secrets Manager (ASM). Alternatively, Fleet Manager can encrypt it and store it, using your Customer Manager Key (CMK) defined in the cloud setup settings.

To assign an AWS access key:

  • Navigate to AWS security > Keypair.

  • In Keypair mode, choose AWS Keypair.

  • In Keypair storage mode, choose an option.

    • Secret stored in ASM.

      • Enter your ASM secret id.

      • Enter your AWS access key id.

    • Secret stored encrypted in Fleet Manager.

      • Enter your AWS access key id.

      • Enter your AWS secret access key.

User-Assigned Managed Identities

You can assign user-assigned managed identities to each instance linked to the instance template.

Note

You created user-assigned managed identities when you set up Fleet Manager. Visit the reference documentation for more information.

It is possible to have one managed identity assigned at startup (before the instance starts up) and another one at runtime (after the instance starts up). This helps to limit the scope of the managed identity while the instance is running.

To assign user-assigned managed identities:

  • Navigate to Azure security > User-assigned managed identities.

  • In Runtime managed identity, provide the user-assigned managed identity.

  • In Startup managed identity, provide the user-assigned managed identity.

  • Select the Restrict metadata access checkbox to prevent end-user processes from accessing the Azure metadata server.

  • This ensures the Dataiku end users cannot assume the instance role.

How-to | Use the license override setting

You can use the license override setting to apply a Dataiku license file to each instance linked to the template. Alternatively, you can specify a license file for each instance.

  • Navigate to License override (Optional) in the template.

  • In License file, select Enter License.

  • Enter your license file. Be sure to copy the entire contents of the JSON file, including the final ‘}’.

Fleet Manager agent will apply the license file to each instance linked to the template once you have saved your changes and reprovisioned each instance.

To update a license file, repeat these steps.

Note

You can learn more about license file management in this section of the admin guide.