Deploying a Flow to Production

Introduction

Once you have designed a Flow and automated updates to the Flow, you can deploy it to a production environment.

Note

Development and Production environments

A development (or sandbox) environment is an environment where you test new analyses in your project. Failures in this environment are an expected part of its experimental nature.

A production environment is where serious operational jobs are run. This environment should be available whenever necessary and may serve external consumers for their day-to-day decisions, whether those consumers are humans or software. Failure is not an option in production, and the ability to roll-back to a previous version is critical.

Dataiku provides two dedicated nodes to handle development and production:

  • Dataiku Design Node is used for the development of data projects.

    • It provides capabilities for the creation of data pipelines amd models, plus the definition of how they are meant to be reconstructed. Projects developed in the Design Node are packaged and handed off to the Automation Node.

  • Dataiku Automation Node is used to import packaged projects defined in the Design Node and run them in the production environment.

    • When you make updates to the project in the Design node, you can create an updated version of the project package, import the new package into the Automation node, and control which version of the project runs in production.

Development work from the Design node flows to the Automation node, and while it is technically possible to make changes to a project in the Automation node, those changes don’t flow back to the Design node, so it’s best practice to do all development in the Design node.

Let’s Get Started!

In this tutorial, you will learn how the Design and Automation Nodes work together:

  • Packaging flows for deployment

  • Versioning flows

  • Deploying packages in a production environment

We will work with the fictional retailer Haiku T-Shirt’s data.

Prerequisites

This tutorial assumes that:

  • You have completed the Automation course.

  • You have access to a:

We also assume that an infrastructure has been defined on the Deployer to connect the Design node to the Automation node.

Create Your Project

From the homepage of the Dataiku Design node, click +New Project > DSS Tutorials > Automation > Deployment (Tutorial).

For the purposes of this tutorial, the flow and automation scenarios are complete, and we simply need to package the flow and deploy it to the Automation node.

Packaging a Flow into a Bundle

  • In order to package the flow into a bundle, from Settings menu in the top navigation bar, choose Bundles.

  • Click Create your first bundle and name it automation_v1.

Note

Key concept: Bundle

A bundle is a snapshot of a complete DSS project.

The bundle includes the project configuration so that it can be deployed to a Dataiku DSS Automation node. In addition, sometimes, in your Flow, the data for some datasets (such as enrichment data) or models (that are retrained in the development and not the production environment) need to be transported to the production environment.

A bundle can contain data for an arbitrary number of datasets, managed folders and saved models.

A bundle thus acts as a consistent packaging of a complete flow. On the Automation node, you then activate a bundle to switch the project to a new version. Bundles are versioned, and you can revert to a previous bundle in case of a production issue with the new bundle.

You can set up multiple Automation nodes to create continuous delivery pipelines (for example with a pre-production automation node, a performance test one, and the production one).

For this tutorial, we will include the data for the Orders and Customers managed folders. In a real-life setting however, these primary data sources would be different on the development and production environments.

  • Click Create.

The following video goes through what we just covered.

Deploying a Bundle

  • Select the bundle you’ve just created.

  • In the Actions panel, click Publish on Deployer and publish the automation_v1 bundle.

When it completes, open the bundle on the Deployer. Now it’s time to create a new Deployment and push it to the Automation node.

Note

Deployments

The bundle contains the information about the project from the Design node. When combined with information on a production infrastructure, this becomes a Deployment that can be pushed to an Automation node.

  • Click Deploy.

  • In the New Deployment dialog, select the target infrastructure for the Automation node and click Create.

  • In the Status tab of the new deployment, click Deploy and Activate.

Within the deployment, you can change the deployment parameters on the Settings tab. For example, by default, scenarios are not activated upon deployment.

  • Navigate to the Scenarios panel of the Settings tab, and click Activate All.

  • Click Save and Update.

That’s it! The flow you set up in the Design node is now running in production. If you open your Automation node, you’ll see the project has been deployed.

Note

Connections mapping

Note that you may need to re-map connections to data sources that exist in the Design node to how they should or will exist on the Automation node. Dataiku DSS will prompt you if this is necessary.

In the Automation node, you need to have connections of the proper type, but their definition can change.

A simple example of this is a SQL database: you’ll have a production database separate from the development database, so when deploying the bundle, you’ll need to reattach the SQL datasets to the production database.

The following video goes through what we just covered.

Versioning a Flow

Now, let’s say that we want to change the scenario in the Flow so that it runs on a monthly basis, rather than when the underlying data sources are updated. To do this, we will:

  • update the project on the Design node

  • repackage the project into a new bundle version

  • deploy the new bundle to the Automation node

Here are the detailed steps:

  • Open the original project on the Design node. Navigate to the Scenarios tab and open the Rebuild data retrain model scenario.

  • Turn off the existing trigger. Rather than deleting it, in case you want to switch back to this trigger later, click Add trigger, and select Time-based trigger.

  • Name the trigger Monthly rebuild & retrain. Make the frequency of the trigger Monthly and have it set to trigger on the 1st of each month. Save your scenario.

  • Navigate to the Bundles area. Click Create bundle and name it automation_v2.

  • Leave a descriptive release note for your colleague in charge of the production environment, like Changed the scenario trigger to be monthly. These changes are now visible in the bundle’s commit log and diff tabs (accessible in the upper right), both here on the Design node and when the bundle is redeployed on the Automation node.

  • Click Create.

The following video goes through what we just covered.

Updating a Deployment

Now we want to update our Deployment with the new project bundle, and then push the changes to the Automation node.

  • With the new bundle selected, click Publish on Deployer, then Open in Deployer.

  • The Deployer opens with the new bundle. Click Deploy.

  • We want to update the existing Deployment, so ensure that the Update option is selected and then click OK.

  • Confirm the Deployment to update and click OK.

  • Click Update to push the changes to the Automation node.

The project in production has been updated to execute on a monthly basis. You can check this by navigating to the Scenarios of the project on the Automation node and seeing that the changes you made on the Design node are visible in the Automation node.

A scenario in a project deployed to the Automation node.  The trigger is updated with changes made on the Design node.

If you need to roll back to a previous bundle version, go to the Deployer, select the version you want, and Deploy it as an update.

The deployments dashboard of the Deployer, showing two bundles from the same project.  One is actively deployed.

What’s Next

Congratulations! Deploying a Flow to production and managing versions of the flow is easy to do in Dataiku DSS.