Machine Learning (ML) Model Packages¶
Once we’ve designed our machine learning model and deployed the active version to the Flow, we’ll want to package the model for deployment to production.
Challenges of Creating a Model Package¶
When deploying our model to production, we are not only deploying code, we are deploying a model package. The components of a model package, such as the model and the datasets used to train it, can have dependencies that need to be available at runtime. By having the dependencies available at runtime, we can ensure that the model predictions made in the development environment are replicable in the production environment. Problems can arise when the versions of these dependencies are different between the development and production environments.
In Dataiku, we refer to the model package as a saved model. However, ML practitioners might also refer to it as a model artifact.
Typical Package Components¶
Once our code and artifacts are stored in a centralized repository, we can build a testable and deployable bundle of the project. The model package includes the following components:
Documented code for implementing the model including its preprocessing
Hyperparameters and their configuration
Training and validation data
Data for testing scenarios
The trained model, in its runnable form
A code environment including libraries with specific versions and environment variables
In Dataiku, the model package is represented by a green diamond icon in the Flow.
The model package is not the same as a project bundle. A project bundle’s contents contains more than just the model package–it is a snapshot of the project’s metadata at the time of its creation. To learn more, visit Bundle contents.
To view the active model’s components (the trained model and its metadata), you can visit the model summary page.
Mark Treveil and the Dataiku team. Introducing MLOps: How to Scale Machine Learning in the Enterprise. O’Reilly, 2020.