Concept | Model packaging for deployment#

Once we’ve designed our machine learning model and deployed the active version to the Flow, we’ll need to package the model for deployment to production.

Flow chart of model packaging for deployment.

Challenges of creating a model package#

When deploying a model to production, we are not only deploying code. We are deploying a model package. The components of a model package, such as the model and the datasets used to train it, can have dependencies that need to be available at runtime.

By having the dependencies available at runtime, we can ensure that the model predictions made in the development environment are replicable in the production environment. Problems can arise when the versions of these dependencies are different between the development and production environments.

Important

In Dataiku, we refer to the model package as a saved model. However, ML practitioners might also refer to it as a model artifact.

Typical package components#

Once our code and artifacts are stored in a centralized repository, we can build a testable and deployable bundle of the project.

A model package includes the following components:

  • Documented code for implementing the model including its preprocessing

  • Hyperparameters and their configuration

  • Training and validation data

  • Data for testing scenarios

  • The trained model, in its runnable form

  • A code environment including libraries with specific versions and environment variables

In Dataiku, a green diamond icon in the Flow represents the model package.

Dataiku screenshot of a Flow including a saved model and scoring recipe.

Important

A model package is not the same as a project bundle. A project bundle contains metadata of a saved model (and other objects). To learn more, visit Creating a bundle in the reference documentation.

To view the active model’s components (the trained model and its metadata), double click on the saved model in the Flow, and select the active version.

Dataiku screenshot of a model report.

See also

If you are unfamiliar with saved models, see the ML Practitioner learning path.

Works Cited

Mark Treveil and the Dataiku team. Introducing MLOps: How to Scale Machine Learning in the Enterprise. O’Reilly, 2020.