Impact of Modifying Instance Templates and Settings

Fleet Manager provides a single user interface for managing your Dataiku instances and modifying instance settings. While this allows for flexibility, the modifications you make can significantly impact your Dataiku users or even cause unwanted results.

In general, you can modify any instance setting that does not impact the operation of Dataiku. You can make modifications to your Dataiku instances by modifying the instance templates, virtual networks, or the instance settings themselves.

In this article, we’ll discuss the following and how each impacts or is impacted by modifications:

  • Disk Sizes

  • SaaS Model

  • Lifecycle

  • Monitoring Tools

  • Dataiku Releases and Security Patches

Disk Sizes

DSS instances are based on a data disk and an Operating System (OS) disk. The data disk contains everything stateful relevant for DSS to run. This is why Fleet Manager only snapshots the data disk. The data disk is the only thing that matters when provisioning or reprovisioning an instance because the OS disk is always replaced at provisioning time.

Caution

You should avoid storing anything outside the data disk because when you upgrade or reprovision an instance, everything stored outside the data disk is lost.

Data Disk

The data disk contains all the DSS configuration and its data files. Fleet Manager uses Elastic Block Storage (EBS) volumes as the storage layer for the data disk.

It’s possible to set a starting size for the data disk and the maximum size the disk is allowed to reach. The Fleet Manager agent in the DSS instance will automatically grow the disk whenever the space occupied reaches 80% until it reaches the maximum allowed size.

Even though it’s not best practice to store data in local filesystem connections, sometimes it’s convenient for small datasets or lookups. Furthermore, DSS will need a reasonably sized data disk to store logs, code environments, and anything else that cannot be offloaded to cloud storage.

OS Disk

The OS disk is where the OS and other binaries are installed. The OS disk can be considered as temporary because it is replaced every time the instance is reprovisioned. However, a good reason to have a reasonably sized OS disk (20GB to 50GB) is because Python and R packages, along with ML models, might use the OS’ default temp folder location to store temporary files. There are ways to alter this behavior, but unfortunately, not all packages/tools abide by the same conventions.

SaaS Model

The deployment model offered by Fleet Manager is most similar to a SaaS model, where Dataiku is used as a service deployed by its management tool using settings and assets that have been configured earlier by Dataiku.

Dataiku Instance Lifecycle

A Dataiku instance is destroyed and reprovisioned many times during its lifecycle. Instances are temporary, and only the data disk is kept when reprovisioning or upgrading.

Monitoring Tools

Since Dataiku instances are temporary, it is a recommended best practice to minimize customizations or installation of monitoring tools. It is okay to install lightweight agents such as those that acknowledge the Dataiku instance in your organization’s network. You can do this by running ansible tasks in the Setup actions in your instance template.

Dataiku Releases and Security Patches

Fleet Manager follows the same release cycle as Dataiku DSS (Dataiku). The image template (such as the AMI or Azure image template) is updated at every Dataiku release. The image template is configured with the best settings for Dataiku and the latest security patches available at the time of creation.