Deploying a Dataiku Instance to Cloud Stacks on AWS

In this article, we’ll walk through a step-by-step process to set up and deploy an instance of Dataiku using Dataiku Cloud Stacks for AWS.

Deploying a Dataiku instance is a three-step process:

  • Gather information

  • Deploy Fleet Manager

  • Deploy a first instance

We’ll use an AWS CloudFormation template for deploying Dataiku Fleet Manager. Then, we’ll use Fleet Manager to deploy our first Dataiku instance.

Fleet Manager handles the entire lifecycle of Dataiku instances, freeing you from most administration tasks. A fleet is the collection of Dataiku resources such as nodes deployed together. A blueprint is a set of pre-configured instances or complete fleets.

At the end of this user guide, you’ll have set up a Dataiku Cloud Stack for AWS with Dataiku Fleet Manager and a single Dataiku Design node without Elastic AI capabilities.

Completing these steps will help you understand the basics of Fleet Manager so that you’ll be able to deploy more complex instances, including complete fleets.

Step 1. Gather Information

Gathering the necessary information up front will make the process run smoother. You can find the following information in your AWS environment or by contacting your network administrator. You’ll need this information to complete the CloudFormation template:

Minimum Requirements

To deploy a single design node without Kubernetes (Elastic AI) capabilities, gather the following information:

  • Landing zone information, including the AWS account and region.

  • An AWS Identity and Access Manager (IAM) role dedicated to management of the underlying Dataiku Infrastructure-as-a-Service (IaaS). This role is only accessible to administrators. It owns the following:

    • Amazon EC2 permissions to create/delete/run instances, volumes, and snapshots

    • Amazon VPC permissions to create security groups

    • IAM permission to pass the Fleet Manager role to Dataiku Fleet Manager instances

  • An Amazon VPC with a CIDR /24 size with DNS support enabled

    • The requirements for deploying Elastic AI resources are different

  • An AWS keypair

  • Optional: Determine if you want a public or private IP address to access Fleet Manager.

Requirements for Deploying with Elastic AI Capabilities

If you want to go beyond the steps prescribed in this article and deploy a node with Kubernetes (Elastic AI) capabilities, gather the following additional information:

  • A dedicated IAM role for Fleet Manager for managing and interacting with elastic computation. This role is only accessible to administrators. It owns the following:

    • Amazon ECR permissions to push images

    • Permission to create and operate Amazon EKS clusters

  • An Amazon VPC with a CIDR /16 size with DNS support enabled

    • This replaces the requirements for deploying without Elastic AI

  • An Amazon VPC with two /20 subnets in different availability zones

Step 2. Set Up and Deploy Fleet Manager

In this section, we’ll use a Dataiku AWS CloudFormation template to deploy a CloudFormation stack that contains the following infrastructure:

  • A Dataiku Fleet Manager VM including its storage, an instance profile, and a security group

  • An AWS IAM role for the purpose of creating a daily backup policy for Fleet Manager

The resources shown in the diagram are needed to set up and deploy Fleet Manager:

../../../_images/aws-cloudstack-01.png

Complete the CloudFormation Template

To set up and deploy Fleet Manager, we’ll use an AWS CloudFormation template. The AWS template is a public S3 URL and is specific to a version of Fleet Manager. For example, for Fleet Manager v10.0.2, the URL is: https://dataiku-cloudstacks.s3.amazonaws.com/templates/fleet-manager/10.0.2/fleet-manager-instance.yml

To retrieve and upload the template:

  • Open the AWS CloudFormation console.

  • Choose Create stack > With new resources (standard).

  • Visit the installation documentation and then copy the Amazon S3 URL.

  • In the Specify template section, enter the URL in Amazon S3 URL.

  • Select Next.

Specify the Stack Details

  • On the Specify stack details page, enter the Stack name and parameters.

  • Select Next.

Stack Details Guidelines

  • In VPC id, enter the VPC ID where Fleet Manager will be deployed.

  • In VPC CIDR, enter the CIDR of the VPC using “X.X.X.X/X” format.

  • In Subnet Id, enter the subnet ID where Fleet Manager will be deployed.

  • In IP addresses allowed to connect, enter 0.0.0.0/0 to authorize TCP connection to Fleet Manager from anywhere, or enter your own IP address range.

  • In SSH Key Pair, select an existing AWS SSH keypair that is allowed to connect to Fleet Manager.

  • In Fleet Manager IAM Role, enter the name of your dedicated Fleet Manager IAM role.

  • In Username, the default Fleet Manager username displays.

  • In Password, enter a strong password for the Fleet Manager admin. You’ll use this password to access Fleet Manager and manage your Dataiku instances.

  • In Instance type, specify the desired instance type for the Fleet Manager instance.

  • In AssociatePublicIpAddress, specify whether or not you need a public IP address. Select true if you want a public IP address for connecting to Fleet Manager; otherwise, select false.

    • If using a public IP address, you may need to consider how to secure it.

  • Leave (Optional) Restore from this SnapshotID empty unless migrating or upgrading from a snapshot. “Null” is the default setting for a new installation.

    • To migrate or upgrade from a snapshot, enter the snapshot ID of the data volume of the existing Fleet Manager.

  • In (Optiona) Volume encryption Key ID, specify a ​​custom KMS Key ID for the Fleet Manager disks.

Configure the Stack Options

  • In Tags, you can specify tags to apply to resources in your stack.

  • Keep the default values for the remaining parameters.

  • Choose Next.

Review and Create the Stack

  • On the Review page, review the details of your stack.

  • Choose Create stack to launch the stack.

../../../_images/create-aws-stack.png

You are now ready to deploy your first instance of Dataiku.

Step 3. Deploy Your First Dataiku Instance

Log in to Fleet Manager

To connect to Fleet Manager, you’ll need to determine the Fleet Manager IP Address. To do this:

  • In the AWS CloudFormation Console, navigate to the Stack details and open the Resources tab.

  • Select the Physical ID for the instance.

../../../_images/stack-resources-instance-id.png
  • Copy the Fleet Manager IP address.

  • Paste the IP address into your browser using the``https://`` prefix.

    • For example, if your IP address is xx.x.xx.xx, you’ll type https://xx.x.xx.xx in your browser.

Wait while Fleet Manager starts.

  • Enter the administrator username and password you specified in the stack details.

The Dataiku Fleet Manager home page displays.

Configure Your Dataiku License

To deploy a Dataiku instance, you’ll first need to configure your Dataiku license. One way to do this is by using the settings under Cloud setup.

  • Under Settings, choose Cloud setup.

  • Select Edit.

  • In License mode, choose Manually entered.

  • In License file, enter your license file. Be sure to copy the entire contents of the JSON file including the final ‘}’.

  • Save your changes.

The license file now displays the license expiration date.

Deploy a Fleet Blueprint

Fleet blueprints allow you to deploy pre-configured instances or complete fleets with minimal setup. Fleet blueprints automatically create instances, instance setting templates, and virtual networks. Using Fleet blueprints you can deploy everything from a minimal design with a single design node to a full fleet with Elastic AI capabilities.

In this section, we’ll deploy a single design node without Kubernetes (Elastic AI) capabilities.

  • Under Quick Start, choose Fleet blueprints.

  • Choose Deploy Minimal Design.

../../../_images/deploy-minimal-design.png

Fleet Manager displays the fleet configuration page.

  • Enter a descriptive name for the fleet.

    • Follow your organization’s naming convention for network resources.

    • For example, “admin-learning-dss-01”.

    • This name is added to all deployed instances, instance templates and virtual networks.

  • Optional: Add tags to tag your AWS resources.

  • Select Deploy.

Fleet manager deploys the fleet and lets you know that the fleet is ready for provisioning.

  • Select OK.

Return to the Fleet Manager home page to view All instances. You are now ready to provision the instance.

Provision the Instance

The Dataiku instance(s) you deployed using the blueprint are not yet available to Dataiku users. In this section, we’ll provision the instance.

Before provisioning any instance(s), you can specify configuration options, including the VM type and the name of the AWS objects, and add tags.

  • From Instances, choose All and then locate the new instance.

  • Select the name of the instance to view its settings.

The status of the instance is Not provisioned.

  • Navigate to the Settings tab of the instance.

Each field is labeled to let you know if the change requires reprovisioning.

  • If you made any changes, select Save.

You can now provision the instance to make it available to Dataiku users.

  • Select the Save menu arrow to display the list.

  • Choose Provision.

../../../_images/provision-button.png

Wait while Fleet Manager provisions the instance.

In the Fleet Manager home page, you can see that the status of the instance has changed from Not provisioned to Running.

The resources deployed in the AWS resource group now look like this:

../../../_images/aws-cloudstack-02.png

You are now ready to launch the Dataiku instance.

Launch Dataiku

In this section, we’ll launch the Dataiku instance.

  • From Instances, choose All and locate the newly provisioned instance.

  • Select the name of the instance to view its settings.

The dashboard displays the following information:

  • The instance details, including the storage capacity

  • Settings templates

  • The VM’s Virtual network connection

  • Basic VM information

  • Agent logs

To launch Dataiku:

  • Select Retrieve to retrieve the initial admin password.

    • Copy the password. It will display only once.

  • Select Go to DSS.

../../../_images/instance-status-running.png
  • Sign in with your admin username and password.

  • Change the initial password.

Note

You can manage the admin password by visiting the Administration menu in your Dataiku instance. Visit the Security tab and then select the admin user. In Change password, enter a new password.