Get started#

When deploying a model to production, monitoring is an important topic to tackle upfront. The first step is to actually define what you want to monitor, why, and with which consequences, using the usual methods provided by metrics, data quality rules, checks, and scenarios.

Once you have a good understanding of your requirements, the next step is the implementation. At this stage, following Dataiku’s resources on MLOps, such as the MLOps learning path, is recommended for a good understanding of the features at play.

However, the ML landscape within your organization might be heterogeneous. You might have models running in various contexts: some fully inside Dataiku’s ecosystem and others outside — through model exports or external deployments.

Nevertheless, even in contexts where model scoring is done outside Dataiku, model monitoring can still be done inside Dataiku.

Objectives#

This tutorial explains how to design a model monitoring feedback loop in several different contexts.

The first two cases demonstrate model scoring and monitoring entirely within Dataiku:

  • A deployed model scored with a batch Flow

  • A deployed model scored as an API endpoint

The last two demonstrate model monitoring within Dataiku in situations where model scoring is done outside Dataiku:

  • A model exported in Java

  • A model exported in Python

Dataiku screenshot of the Flow for all monitoring contexts.

Prerequisites#

To focus on the choices of model monitoring in different contexts, we have simplified the configuration for these cases to the greatest degree possible.

For any of the above cases, you’ll need:

  • Dataiku 12.0 or later.

  • A Full Designer user profile on the Dataiku for AI/ML or Enterprise AI packages.

  • Broad knowledge of Dataiku (Core Designer + ML Practitioner level or equivalent).

Each of the cases listed above may have additional specific requirements found at the beginning of each section.

Create the project#

The starter project is based on the Kaggle Pokemon dataset.

  • From the Dataiku Design homepage, click + New Project > DSS tutorials > MLOps Practitioner > Model Monitoring Contexts.

Note

You can also download the starter project from this website and import it as a zip file.

Explore the Design Flow#

First review the Design Flow zone.

  • Every row in the pokemon dataset is a different Pokemon, with columns representing dozens of characteristics and abilities.

  • Every Pokemon belongs to one of eighteen different types (represented as type1 in the dataset), such as water, normal, grass, etc.

  • After some basic data cleaning in the Prepare recipe, we have built a standard multi-class prediction model to predict the type of Pokemon using Dataiku’s AutoML tool, and then deployed it to the Flow.

Once you understand the basic use case at hand, build the Flow before moving ahead to the monitoring instructions.

  1. From the corner of the Design Flow zone, click Build.

  2. Click Build once more to build the pipeline ending with the prediction model.

Dataiku screenshot of the dialog for building the Design Flow zone.

Ground truth vs. input drift monitoring#

To simplify matters, in all of the monitoring contexts to be presented, we have chosen to demonstrate input drift monitoring as opposed to ground truth monitoring.

If you examine the pokemon_for_scoring dataset, you’ll see that the target variable type1 is removed in the Prepare recipe. We assume we do not know the true answer of the model’s predictions. This is our hypothesis. Accordingly, all Evaluate recipes skip the computation of performance metrics.

Dataiku screenshot of the settings of an Evaluate recipe.

Due to the differences between these two different types of monitoring, your Flow might build multiple model evaluation stores for a single model. For example:

  • One Flow zone builds a model evaluation store with just prediction logs that monitors only input data and prediction drift. This scenario might run every day.

  • In parallel, another Flow zone builds a model evaluation store with “ground truth-enriched” prediction logs that also monitors performance drift. Depending on the complications of reconciling ground truth, this data may have fewer rows or be older. This scenario might run every month.

See also

To gain experience computing both kinds of monitoring, see Tutorial | Model monitoring with a model evaluation store.

Model vs. data monitoring#

Although our focus here is model monitoring, you should recognize that model monitoring is only one leg of a robustly-managed production project. The same tools of metrics, data quality rules, checks, and scenarios should also be applied to objects like datasets and managed folders, as they are the upstream inputs to saved models and the Evaluate recipe.

See also

You can learn more about automation tools in Dataiku in the reference documentation or Knowledge Base.

Deployment contexts#

Now that you have set up your project, move on to any of the following model monitoring examples based on your interests. They can be completed in any order independently of each other.