Concept: Batch Deployment

In this article, we’ll discuss the process for pushing a batch-processing project to production, meaning going from a project on a Design node to a project bundle deployed on an Automation node—with the help of the Project Deployer.

Tip

This content is also included in a free Dataiku Academy course on Projects in Production, which is part of the MLOps Practitioner learning path. Register for the course there if you’d like to track and validate your progress alongside concept videos, summaries, hands-on tutorials, and quizzes.

Slide depicting an overview of batch deployment in Dataiku.

Ready for Production

Let’s assume we have a batch-processing project ready to be deployed into production.

First, recall what a “ready” project is. A project “ready” to be deployed into production has robust scenarios with metrics, checks, and reporters; optimized pipelines; and well-documented workflows, among other qualities.

Slide depicting what constitutes a project ready for production.

Note

For review, return to the Preparing for Production course.

Development vs. Production Environments

In order to deploy a project into production, we need to transfer it from its development environment (a Design node) to a production environment (an Automation node).

Recall that a Design node is an experimental sandbox for developing data projects. Since you are creating new data pipelines and models there, you can expect jobs to fail occasionally.

Slide depicting a Design node as a development environment.

An Automation node, on the other hand, is a production environment. It is for operational jobs serving external consumers.

Recall that the Automation node also needs to be “ready” for production. Among other requirements, it should have the correct connections, such as the production version of databases, and the same plugins and code environments found in the project on the Design node.

Slide depicting an Automation node as a production environment.

Note

You can review this material in an article on Dataiku Architecture in the Production Concepts course.

The Project Deployer

How do we transfer a project from a Design node to an Automation node?

We could manually download the project bundle from the Design node and upload it to the Automation node. However, this is not the preferred way. We’ll use the Project Deployer to make this process even easier.

Some of the following details are interesting only to administrators, and so we won’t cover it in great detail.

The Project Deployer is one of two components of the Deployer:

  • The Project Deployer is used to batch deploy project bundles to an Automation node (which is our focus here).

  • The API Deployer is used to deploy real-time API services to an API node (as will be done in the Real-time APIs course).

Slide depicting relationships between the Deployer and Design, Automation, and API nodes.

Architecture

If your architecture has a single Design or Automation node, the Project Deployer can be part of this DSS node itself—a local Deployer. In that case, no additional setup is required. It comes pre-configured.

If your infrastructure includes multiple Design and/or Automation nodes, a separate node can act as the centralized Deployer for all Design and Automation nodes. This is a standalone or remote Deployer.

Your instance administrator will also need to follow the product documentation to set up the infrastructure that enables these nodes to “talk” to each other. As someone just focused on deploying, we’ll consider this task already completed.

Slide depicting the infrastructure necessary to deploy.

Note

Setting up the Deployer is addressed in the product documentation.

Project Bundles

The component that we do need to concern ourselves with is the project bundle being deployed.

You’re probably familiar with creating an export of a Dataiku project from the homepage of a Design node. This kind of export can be imported to other Design nodes for further development.

Unlike an exported project, a project bundle is a versioned snapshot of the project’s configuration. This snapshot can replay the tasks that were performed on the Design node, on an Automation node.

Slide depicting the contents of a project bundle.

By project configuration, we mean things like project settings, notebooks, visual analyses, recipes, scenarios, shared project code, and the metadata from objects like datasets, saved models, and managed folders.

Once a project is ready to be deployed, creating a project bundle is simple. Just navigate to the Bundles page from the “More Options” menu, and click to create a new bundle.

Dataiku screenshot of where to create a new bundle.

Additional Bundle Content

You then have the option of adding additional content to the project bundle. A project bundle does NOT include the actual data, nor the saved models deployed to the Flow. This is because when the project is running on the Automation node, you’ll have new production data running through the Flow.

Dataiku screenshot showing the bundle configuration page.

This bundle includes three uploaded datasets and a saved model.


Depending on your use case, however, you may want to add additional content from certain datasets, managed folders, or saved models. For example, you may need to include datasets for enrichment or reference datasets not recomputed in production. Or, if you plan to score data with a model that has been trained in the Design node, you need to add the model to the bundle.

Slide depicting what's included and not included in a project bundle.

Deploying a Bundle

Once you’ve created the bundle, you can publish it on the Project Deployer. Once on the Deployer, you can manage all of the bundles from all of the projects in various stages of production. If your infrastructure is in place, deploying the bundle is as simple as a few more clicks.

For any particular deployment, you can manage settings, such as remapping connections between the development and production environments.

Dataiku screenshot of where to manage deployment settings from the Deployer.

To verify your deployment is working, open your Automation node to see the project now running in a production environment!

What’s Next?

Those are the basics of creating a project bundle on a Design node and transferring it to an Automation node. Follow the hands-on tutorials to gain experience doing this yourself!

Note

You can learn more about production deployments and bundles in the product documentation.