Tutorial | Monitoring models: An introduction to monitoring in different contexts#
When deploying a model to production, monitoring is an important topic to tackle upfront. The first step is to actually define what you want to monitor, why, and with which consequences, using the usual methods provided by scenarios.
Once you have a good understanding of your requirements, the next step is the implementation. At this stage, following Dataiku’s resources on MLOps, such as the MLOps learning path, is recommended for a good understanding of the features at play.
However, your ML landscape might be heterogeneous. You might have models running in various contexts: some fully inside Dataiku’s ecosystem, and others outside, through model exports or external deployments. Even in contexts where model scoring is done outside of Dataiku, model monitoring can still be done inside Dataiku.
This set of articles explains how to design a model monitoring feedback loop in several different contexts.
The first two cases demonstrate model scoring and monitoring entirely within Dataiku:
A deployed model scored with a batch Flow
A deployed model scored as an API endpoint
The others demonstrate model monitoring within Dataiku in situations where model scoring is done outside of Dataiku:
A model exported in Java
A model exported in Python
To focus on the choices of model monitoring in different contexts, we have simplified the configuration for these cases to the greatest degree possible. However, we assume you have an understanding of MLOps and the deployment capabilities of Dataiku. If not, please see the MLOps Practitioner learning path.
For any of the above cases, you’ll need:
A business or enterprise license for Dataiku (version 11 or above).
Each of the cases listed above may have additional specific requirements found at the beginning of each article.
Create the project#
The use case is based on the Kaggle Pokemon dataset.
From the Dataiku Design homepage, click +New Project > DSS tutorials > MLOps Practitioner > Model Monitoring.
You can also download the starter project from this website and import it as a zip file.
Explore the Design Flow#
First review the Design Flow zone.
Every row in the pokemon dataset is a different Pokemon, with columns representing dozens of characteristics and abilities.
Every Pokemon belongs to one of eighteen different types (represented as type1 in the dataset), such as water, normal, grass, etc.
After some basic data cleaning in the Prepare recipe, we have built a standard multi-class prediction model to predict the type of Pokemon using Dataiku’s AutoML tool, and then deployed it to the Flow.
Once you understand the basic use case at hand, build the Flow before moving ahead to the monitoring instructions.
From the Flow, click Build on the Design Flow zone.
Click Build once more to build the pipeline ending with the prediction model.
Ground truth vs. input drift monitoring#
In all of the monitoring contexts to be presented, we have chosen to demonstrate input drift monitoring as opposed to ground truth monitoring. If you examine the pokemon_for_scoring dataset, you’ll see that the target variable type1 is removed in the Prepare recipe.
To simplify matters, we assume we do not know the true answer of the model’s predictions. This is our hypothesis. Accordingly, all Evaluate recipes skip the computation of performance metrics.
Due to the differences between these two different types of monitoring, your Flow might build multiple model evaluation stores for a single model. For example:
One Flow zone builds a model evaluation store with just prediction logs that monitors only input data and prediction drift. This scenario might run every day.
In parallel, another Flow zone builds a model evaluation store with “ground truth-enriched” prediction logs that also monitors performance drift. Depending on the complications of reconciling ground truth, this data may have fewer rows or be older. This scenario might run every month.
To gain experience computing both kinds of monitoring, see Tutorial | Model monitoring basics.
Model vs. data monitoring#
Although our focus here is model monitoring, you should recognize that model monitoring is only one leg of a robustly-managed production project. The same tools of metrics, checks, and scenarios should also be applied to objects like datasets and managed folders, as they are the upstream inputs to saved models and the Evaluate recipe.
Now that you have set up your project, move on to one of the following model monitoring examples. They can be completed in any order independently of each other.