Tutorial | Model monitoring in different contexts#

Get started#

When deploying a model to production, monitoring is an important topic to tackle upfront. The first step is to actually define what you want to monitor, why, and with which consequences, using the usual methods provided by metrics, data quality rules, checks, and scenarios.

Once you have a good understanding of your requirements, the next step is the implementation. At this stage, following Dataiku’s resources on MLOps, such as the MLOps learning path, is recommended for a good understanding of the features at play.

However, the ML landscape within your organization might be heterogeneous. You might have models running in various contexts: some fully inside Dataiku’s ecosystem and others outside — through model exports or external deployments.

Nevertheless, even in contexts where model scoring is done outside Dataiku, model monitoring can still be done inside Dataiku.

Objectives#

This tutorial explains how to design a model monitoring feedback loop in several different contexts.

The first two cases demonstrate model scoring and monitoring entirely within Dataiku:

  • A deployed model scored with a batch Flow

  • A deployed model scored as an API endpoint

The last two demonstrate model monitoring within Dataiku in situations where model scoring is done outside Dataiku:

  • A model exported in Java

  • A model exported in Python

Dataiku screenshot of the Flow for all monitoring contexts.

Prerequisites#

To focus on the choices of model monitoring in different contexts, we have simplified the configuration for these cases to the greatest degree possible.

For any of the above cases, you’ll need:

  • Dataiku 12.0 or later.

  • A Full Designer user profile on the Dataiku for AI/ML or Enterprise AI packages.

  • Broad knowledge of Dataiku (Core Designer + ML Practitioner level or equivalent).

Each of the cases listed above may have additional specific requirements found at the beginning of each section.

Create the project#

  1. From the Dataiku Design homepage, click + New Project.

  2. Select Learning projects.

  3. Search for and select Model Monitoring Contexts.

  4. Click Install.

  5. From the project homepage, click Go to Flow (or g + f).

Note

You can also download the starter project from this website and import it as a zip file.

Use case summary#

The starter project is based on the Kaggle Pokemon dataset. First review the Design Flow zone.

  • Every row in the pokemon dataset is a different Pokemon, with columns representing dozens of characteristics and abilities.

  • Every Pokemon belongs to one of eighteen different types (represented as type1 in the dataset), such as water, normal, grass, etc.

  • After some basic data cleaning in the Prepare recipe, we have built a standard multi-class prediction model to predict the type of Pokemon using Dataiku’s AutoML tool, and then deployed it to the Flow.

Once you understand the basic use case at hand, build the Flow before moving ahead to the monitoring instructions.

  1. From the corner of the Design Flow zone, click Build.

  2. Click Build once more to build the pipeline ending with the prediction model.

Dataiku screenshot of the dialog for building the Design Flow zone.

Ground truth vs. input drift monitoring#

To simplify matters, in all of the monitoring contexts to be presented, we have chosen to demonstrate input drift monitoring as opposed to ground truth monitoring.

If you examine the pokemon_for_scoring dataset, you’ll see that the target variable type1 is removed in the Prepare recipe. We assume we do not know the true answer of the model’s predictions. This is our hypothesis. Accordingly, all Evaluate recipes skip the computation of performance metrics.

Dataiku screenshot of the settings of an Evaluate recipe.

Due to the differences between these two different types of monitoring, your Flow might build multiple model evaluation stores for a single model. For example:

  • One Flow zone builds a model evaluation store with just prediction logs that monitors only input data and prediction drift. This scenario might run every day.

  • In parallel, another Flow zone builds a model evaluation store with “ground truth-enriched” prediction logs that also monitors performance drift. Depending on the complications of reconciling ground truth, this data may have fewer rows or be older. This scenario might run every month.

See also

To gain experience computing both kinds of monitoring, see Tutorial | Model monitoring with a model evaluation store.

Model vs. data monitoring#

Although our focus here is model monitoring, you should recognize that model monitoring is only one leg of a robustly-managed production project. The same tools of metrics, data quality rules, checks, and scenarios should also be applied to objects like datasets and managed folders, as they are the upstream inputs to saved models and the Evaluate recipe.

See also

You can learn more about automation tools in Dataiku in the reference documentation or Knowledge Base.

Deployment contexts#

Now that you have set up your project, move on to any of the following model monitoring examples based on your interests. They can be completed in any order independently of each other.

A batch workflow within Dataiku#

Many data science workloads call for a batch deployment framework.

As a means of comparison to other deployment contexts, this section presents how to monitor a model under a batch deployment framework staying entirely within Dataiku.

Additional prerequisites#

For this case, you’ll only need to satisfy the requirements included in the shared prerequisites, including creating the starter project found there.

Score data within Dataiku#

For this case, we’ll be using the Dataiku Monitoring (Batch) Flow zone found in the starter project.

  1. In the Dataiku Monitoring (Batch) Flow zone, select the pokemon_scored_dss dataset.

  2. Click Build > Build Dataset with the Build Only This setting to run the Score recipe.

  3. Before moving to the monitoring setup, examine the schema of the output to the Score recipe compared to the input. You should notice the addition of a prediction column containing the predicted type of Pokemon.

Dataiku screenshot of the output dataset from a Score recipe.

Note

You’ll notice that, in addition to a prediction column, the schema of the pokemon_scored_dss dataset includes four columns beginning with smmd_. This is because, in the parent Score recipe, we’ve chosen to output model metadata.

Monitor model metrics#

The monitoring setup in this case is the same as that presented in Tutorial | Model monitoring with a model evaluation store, but we can review the core tenets for completeness.

In Dataiku’s world, the Evaluate recipe takes a saved model and an input dataset of predictions, computes model monitoring metrics, and stores them in a model evaluation store (MES). In this case, we assume that we do not have ground truth, and so are not computing performance metrics.

  1. In the Dataiku Monitoring (Batch) Flow zone, open the empty model evaluation store called Monitoring - DSS Automation.

  2. From the Actions sidebar, click Build > Build Evaluation Store, thereby running the Evaluate recipe.

  3. When the job finishes, refresh the page to find one model evaluation.

Dataiku screenshot of a model evaluation store with one model evaluation.

See also

Review the reference documentation on model evaluations if this is unfamiliar to you.

Automate model monitoring#

The same automation toolkit of metrics, checks, and scenarios that you find for Dataiku objects like datasets also is available for model evaluation stores.

  1. Within the Monitoring - DSS Automation MES, navigate to the Settings tab.

  2. Go to the Status Checks subtab.

  3. Observe the example native and Python checks based on the data drift metric.

Dataiku screenshot of a status check for a MES metric.

With acceptable limits for each chosen metric formally defined in checks, you can then leverage these objects into a scenario, such as the Monitor batch job scenario included in the project, that:

  • Computes the model evaluation with the data at hand.

  • Runs checks to determine if the metrics have exceeded the defined threshold.

  • Sends alerts if any checks return an error or trigger other actions.

Dataiku screenshot of a sample scenario using a MES check.

Note

The Monitor batch job scenario found in the project uses a Microsoft Teams webhook, but many other reporters are available.

You’ll also notice that the scenario has no trigger attached. Determining how often your scenario should run is highly dependent on your specific use case, but you’ll want to make sure you have enough data for significant comparisons.

Push to the Automation node#

This article presents the basis of building a working operationalized project that will automatically batch score, monitor, and alert. Although simple, it highlights the main components to use such as the Evaluate recipe, the model evaluation store, and scenarios controlled by metrics, checks, and/or data quality rules.

In this simplified example, we performed both scoring and monitoring on the Design node. However, in a real-life batch use case contained within Dataiku’s universe, both scoring and monitoring should be done on an Automation node. A true production environment, separate from the development environment, is required in order to produce a consistent and reliable Flow.

Accordingly, the next steps would be to:

  • Create a project bundle on the Design node.

  • Publish the bundle to the Project Deployer.

  • Deploy the bundle to the Automation node.

  • Run scenarios on the Automation node for both the scoring and monitoring — the entire Flow zone Dataiku Monitoring (Batch).

Dataiku screenshot of a bundle on the Project Deployer.

Tip

Follow the Tutorial | Batch deployment for a walkthrough of these steps.

An API endpoint on a Dataiku API node#

Many data science workloads call for a real-time API framework, where queries sent to an API endpoint receive an immediate response.

As a means of comparison to other deployment contexts, this section presents how to monitor a model under a real-time API framework staying entirely within Dataiku.

Additional prerequisites#

In addition to the shared prerequisites, you’ll also need:

Deploy the model as an API endpoint#

The starter project already contains the API endpoint that we want to monitor, and so the next step is pushing a version of an API service including the endpoint to the API Deployer.

  1. From the top navigation bar, navigate to More Options (Horizontal dots icon.) > API Designer.

  2. Open the pokemon API service.

  3. Note how it includes one prediction endpoint called guess using the model found in the Flow.

  4. Click Publish on Deployer and Publish to confirm publishing v1 of the service to the API Deployer.

Dataiku screenshot of an API service with a prediction endpoint.

Once the API service exists on the API Deployer, we can deploy the service to an infrastructure.

  1. From the waffle (Waffle icon.) menu in the top navigation bar, click Local (or Remote) Deployer.

  2. Click Deploying API Services.

  3. In the Deployments tab of the API Deployer, find the API version that you just pushed to the API Deployer, and click Deploy.

  4. If not already selected, choose an infrastructure.

  5. Click Deploy and Deploy again to confirm.

Dataiku screenshot of an API deployment.

Note

To review the mechanics of real-time API deployment in greater detail, please see Tutorial | Real-time API deployment.

Generate activity on the API endpoint#

Before we set up the monitoring portion of this project, we need to generate some activity on the API endpoint so that we have actual data on the API node to retrieve in the feedback loop.

  1. Within the Status tab of the deployment, navigate to the Run and test subtab for the guess endpoint.

  2. Click Run All to send several test queries to the API node.

Dataiku screenshot of test queries of an API deployment.

Create a feedback loop on the API endpoint#

Now direct your attention to the Dataiku Monitoring (API) Flow zone. Just like the batch Flow zone, we have an Evaluate recipe that takes two inputs (a dataset of predictions and a saved model) and outputs a model evaluation store. However, there are two subtle differences.

Dataiku screenshot of a Flow zone for monitoring API node log data.

API node log data#

The input data in this context comes directly from the API node. As explained in Tutorial | API endpoint monitoring, the storage location of this data differs for Dataiku Cloud and self-managed users.

  1. Follow the steps in Audit trail on Dataiku Cloud to access API node queries.

  2. Once you’ve imported this dataset, replace pokemon_on_static_api_logs with the apinode_audit_logs dataset as the input to the Evaluate recipe in the Dataiku Monitoring (API) Flow zone.

After pointing this dataset to the correct prediction logs, we can now explore it. Each row is an actual prediction request answered by our model. You can find all the features that were requested, the resulting prediction, with details and other technical data.

Dataiku screenshot of the Explore tab of API node log data fetched from the Event server.

Warning

Although we are showing a local filesystem storage for the API node logs to make the project import easier, in a real situation, any file-based cloud storage is highly recommended. This data can grow quickly, and it will not decrease unless explicitly truncated.

It would also be common to activate partitioning for this dataset.

The Evaluate recipe with API node logs as input#

Another subtle difference between the Evaluate recipe in the API Flow zone compared to the Batch Flow zone is the option to automatically handle the input data as API node logs.

With this activated (detected by default), you do not need to care about all the additional columns or the naming.

  1. Open the Evaluate recipe in the Dataiku Monitoring (API) Flow Zone.

  2. Confirm that the input dataset will be handled as API node logs.

  3. Click Run to produce a model evaluation of the API node logs.

Dataiku screenshot of an Evaluate recipe with API node log input data.

Note

If using a version of Dataiku prior to 11.2, you will need to add a Prepare recipe to keep only the features and prediction columns, and rename them to match the initial training dataset convention.

Create a one-click monitoring loop#

After understanding these details, you should also be aware that since version 12, users can simplify this process by building the entire feedback loop directly from the API endpoint in the API Designer.

  1. From the top navigation bar of the Design node, navigate to More Options (Horizontal dots icon.) > API Designer.

  2. Open the pokemon API service.

  3. Navigate to the Monitoring panel for the guess endpoint.

  4. Click Configure to create a monitoring loop for this endpoint.

  5. Click OK, and then return to the Flow to see the new zone, which, in this case, duplicates the work of the existing Dataiku Monitoring (API) Flow zone.

Dataiku screenshot of the Monitoring panel within the API endpoint of the API Designer.

An exported Python model scored externally#

In some cases, we may wish to export a model designed in Dataiku so that it can be run on an external system — for example, within a more complete pipeline where other tasks are performed before and after the call to the model is done. However, even if model scoring is done externally, model monitoring can still be done within Dataiku.

As a means of comparison to other deployment contexts, this section presents how to monitor a model within Dataiku in situations where the model is exported to Python and scored externally.

Additional prerequisites#

In addition to the shared prerequisites, you’ll also need:

Export the model#

For scoring to occur outside of Dataiku, we’ll first need to export the model used in the starter project.

  1. Double click to open the saved model deployed to the Flow.

  2. Open the report for the Active version by clicking on the model name Random forest (s1) - v3 at the top left of the tile.

  3. At the top right of the screen, click Actions > Export model as ….

  4. Select the Python panel.

  5. Click Export Model.

Dataiku screenshot of the dialog for exporting a Python model.

Note

See the reference documentation on Python model exports to understand the requirements, usage, and limitations.

Run the model outside of Dataiku#

This action downloaded onto your machine a zip file containing the components needed for scoring directly with Python outside of Dataiku.

Inspect the downloaded package#

Let’s take a look at what Dataiku has provided.

  1. Unzip the downloaded package to find:

File name

Contents

model.zip

The exported model

requirements.txt

The Python module required to run the model

sample.py

A sample script for making predictions

Set up the environment#

The next step is to check that your environment and downloaded model is ready by running a sample script.

  1. On the terminal, navigate to the directory holding these files.

  2. Create a virtual environment for your tests.

    virtualenv python-export
    
  3. Activate that environment.

    source python-export/bin/activate
    

You’ll need to make two small adjustments for the sake of this tutorial.

  1. Open the requirements.txt file, and remove the specific version requirements on the dataiku-scoring package.

  2. Add pandas as a second requirement. (This isn’t mandatory for scoring, but will be used in our Python script later).

  3. Load the requirements file.

    pip install -r requirements.txt
    
  4. Once that is setup, call the sample script to validate the environment.

    python sample.py
    

This should output the following:

Output of model.predict():
array(['fire', 'fire'], dtype=object)
Output of model.predict_proba():
{'bug': array([0.02705888, 0.0306321 ]),
 'dark': array([0.05454764, 0.03727579]),
 'dragon': array([0.07957995, 0.00496544]),
 'electric': array([0.06280624, 0.06476114]),
 'fairy': array([0.02217147, 0.03600387]),
 'fighting': array([0.05453975, 0.06410458]),
 'fire': array([0.15311388, 0.24131331]),
 'flying': array([0.0058496 , 0.00308777]),
 'ghost': array([0.04494048, 0.029513  ]),
 'grass': array([0.1031577, 0.1232584]),
 'ground': array([0.04200412, 0.02563218]),
 'ice': array([0.03195237, 0.03471062]),
 'normal': array([0.03372282, 0.0405713 ]),
 'poison': array([0.04058422, 0.06011815]),
 'psychic': array([0.04955909, 0.06700692]),
 'rock': array([0.05377793, 0.0422824 ]),
 'steel': array([0.04674354, 0.00999445]),
 'water': array([0.09389033, 0.08476859])}

Export data for scoring#

Now that we’ve verified a working Python model, we’ll also need a batch of data prepared for scoring. We already have this in the Dataiku project.

  1. Return to your project in Dataiku.

  2. In the Design Flow zone, select the pokemon_for_scoring dataset.

  3. If it is empty, click Build > Build Dataset with the default Build Only This setting.

  4. Otherwise, click Export in the Actions tab.

  5. Click Download to import a local CSV file on your machine of data ready to be scored.

  6. Move it to the directory containing model.zip.

Dataiku screenshot of the dialog for exporting the data needed for scoring.

Score data with a Python script#

Now that we have a Python model and data for scoring, let’s make a script to generate predictions for that data using the model.

  1. Create a file called scoring_dataiku.py with the following code:

    from dataikuscoring import load_model
    import pandas as pd
    
    # Load the model from current export path
    model = load_model('model.zip')
    
    input_df = pd.read_csv('pokemon_for_scoring.csv')
    predict_result = model.predict(input_df)
    output_df = input_df
    output_df['prediction'] = predict_result
    print(" Output of model.predict(): {}".format(output_df))
    output_df.to_csv('pokemon_scored_python.csv', index=False)
    
  2. Move the scoring_dataiku.py file to the directory containing model.zip and pokemon_for_scoring.csv.

  3. Generate predictions on the entire dataset by running:

    python scoring_dataiku.py
    

    This action should create the following output and a CSV file called pokemon_scored_python.csv.

Terminal screenshot of output after running the scoring script.

Import prediction logs for monitoring#

We now have used the model to make predictions and exported these predictions in a usable format. Next we need to import the prediction data into Dataiku so that it can be monitored in the usual way.

  1. In the Python Monitoring Flow zone, open the predictions_python dataset.

  2. Navigate to the Settings tab.

  3. Delete the existing CSV file.

  4. Click Select Files to replace it with the pokemon_scored_python.csv file that you just created.

  5. Click Save to confirm.

Dataiku screenshot of the settings tab of a dataset of predictions scored with Python.

Compute a model evaluation#

Like the other monitoring zones in this project, the Python Monitoring Flow zone includes:

  • An Evaluate recipe with two inputs: the saved model and a dataset of prediction logs (this time created externally with Python).

  • A model evaluation store as output computing standard drift metrics between the training dataset of the model and the actual predictions.

Let’s build the model evaluation store to check the drift of the input data and predictions computed externally with Python.

  1. In the Python Monitoring Flow zone, open the empty Monitoring - Python Export model evaluation store.

  2. In the Actions tab, click Build > Build Evaluation Store with the default Build Only This setting.

  3. When finished building, refresh the page to find the same set of metrics you’d find if you built the MES in other Flow zones.

Dataiku screenshot of a model evaluation store on data scored externally with Python.

Automate model monitoring#

At this point, you have seen an example of how a model export can generate a log file usable to compute monitoring metrics.

In a real use case, the first point to solve is how to automatically move the prediction file from where it is generated to a place accessible to Dataiku. Possible solutions include sending it via FTP or directly pushing it to cloud storage. Rather than an all-around solution, this problem should be analyzed on a case-by-case basis.

Once you have configured the retrieval of logs in an accessible place, you can create a simple scenario to run the Evaluate recipe and generate a model evaluation, which you can then enrich with checks to automate alerts as done in this project’s Monitor batch job scenario or explained in more detail in Tutorial | Model monitoring with a model evaluation store.

Automate model deployment (optional)#

It is perfectly acceptable to keep the deployment of the model as a manual process, as deploying a new model version might not be a very frequent operation. However, if you want to do so, you can build a full deployment pipeline by leveraging Dataiku APIs, typically by training a new model version and then downloading the jar file.

Note

See the Developer Guide for a more detailed example that you can adapt for your needs.

An exported Java model scored externally#

In some cases, we may wish to export a model designed in Dataiku so that it can be run on an external system — for example, to embed the model on an edge device, such as a drone. However, even if model scoring is done externally, model monitoring can still be done within Dataiku.

As a means of comparison to other deployment contexts, this section presents how to monitor a model within Dataiku in situations where the model is exported to Java and scored externally.

Additional prerequisites#

In addition to the shared prerequisites, you’ll also need:

  • Basic knowledge of Java.

  • A JDK installed on your machine.

  • A Dataiku instance with the Java model export feature.

  • We used the IntelliJ IDEA Community edition, but other IDEs are possible.

Export the model#

For scoring to occur outside of Dataiku, we’ll first need to export the model used in the starter project.

  1. Double click to open the saved model deployed to the Flow.

  2. Open the report for the Active version by clicking on the model name Random forest (s1) - v3 at the top left of the tile.

  3. At the top right of the screen, click Actions > Export model as ….

  4. Select the Java panel.

  5. Enter the class name com.dataiku.sample.GreatModel.

  6. With the default full format selected, click Export Model.

Dataiku screenshot of the dialog for exporting a Java model.

Note

See the reference documentation on Java model exports to understand the usage and limitations.

Run the model outside of Dataiku#

Now you have downloaded on your machine a .jar file containing the model and Dataiku Java scoring libraries. To run the model outside of Dataiku, we need a Java program that loads the model, takes input data from a folder, scores it, and exports the result as a CSV file.

Import the project to an IDE#

We have provided a sample Java project in the monitoring-java folder of the academy-samples git repository that meets the above requirements.

  1. Import this project into your favorite IDE. (We used Intellij IDEA Community edition for this test).

The program is made of two files:

  • java/src/com/dataiku/exports/ModelPredictor.java loads the model and scores with it.

  • java/src/com/dataiku/exports/ModelRunner.java loads the input data, sends it for scoring, and saves the output as a CSV file.

The input data is expected within the project in the java/input folder. We’ve already provided a sample file, but you could generate any file you want, provided you respect the schema.

IDE screenshot of the Java project after importing it.

Add the exported jar file as a library to the project#

If using IntelliJ IDEA, the only missing part is the model itself and the scoring library: in other words, the .jar file previously exported from Dataiku.

  1. Add the .jar file as a library in the project.

IDE screenshot of the jar file as a library.

Run the program#

Now, you just need to run the program to generate the prediction log.

  1. In Intellij, right-click on ModelRunner.

  2. Select Run ‘ModelRunner.main()’.

IDE screenshot of the model runner dialog.

This action should create a CSV file of scored data called pokemon_scored_java.csv. The full path will be java/output/java/pokemon_scored_java.csv. This file is the only thing we need to compute drift monitoring in Dataiku.

Import prediction logs for monitoring#

We now have used the model to make predictions and exported these predictions in a usable format. Next we need to import the prediction data into Dataiku so that it can be monitored in the usual way.

  1. In the Python Monitoring Flow zone, open the predictions_python dataset.

  2. Navigate to the Settings tab.

  3. Delete the existing CSV file.

  4. Click Select Files to replace it with the pokemon_scored_java.csv file that you just created.

  5. Click Save to confirm.

Dataiku screenshot of the settings tab of a dataset of predictions scored with Java.

Compute a model evaluation#

Like the other monitoring zones in this project, the Java Monitoring Flow zone includes:

  • An Evaluate recipe with two inputs: the saved model and a dataset of prediction logs (this time created externally with Java).

  • A model evaluation store as output computing standard drift metrics between the training dataset of the model and the actual predictions (this time done externally with Java).

Finally, let’s build the model evaluation store to check the drift of the input data and predictions computed externally with Java.

  1. In the Java Monitoring Flow zone, open the empty Monitoring - Java Export model evaluation store.

  2. In the Actions tab, click Build > Build Evaluation Store with the default Build Only This setting.

  3. When finished building, refresh the page to find the same set of metrics you’d find if you built the MES in other Flow zones.

Dataiku screenshot of a model evaluation store on data scored externally with Java.

Automate model monitoring#

At this point, you have seen an example of how a model export can generate a log file usable to compute monitoring metrics.

In a real use case, the first point to solve is how to automatically move the prediction file from where it is generated to a place accessible to Dataiku. Possible solutions include sending it via FTP or directly pushing it to cloud storage. Rather than an all-around solution, this problem should be analyzed on a case-by-case basis.

Once you have configured the retrieval of logs in an accessible place, you can create a simple scenario to run the Evaluate recipe and generate a model evaluation, which you can then enrich with checks to automate alerts as done in this project’s Monitor batch job scenario or explained in more detail in Tutorial | Model monitoring with a model evaluation store.

Automate model deployment (optional)#

It is perfectly acceptable to keep the deployment of the model as a manual process, as deploying a new model version might not be a very frequent operation. However, if you want to do so, you can build a full deployment pipeline by leveraging Dataiku APIs, typically by training a new model version and then downloading the jar file.

Note

See the Developer Guide for a more detailed example that you can adapt for your needs.

What’s next?#

This series of articles has presented the basic setup to monitor a model in heterogenous ML landscapes.

Not only does Dataiku fully support batch and real-time API workloads within its ecosystem, it also accommodates more diverse ML pipelines where scoring may be done externally using Python or Java, for example.

Tip

If you do have a heterogenous ML landscape, you may be interested to try Tutorial | Surface external models within Dataiku.

See also

See the reference documentation on MLOps to learn more.