Tutorial | Monitoring models: An exported Java model scored externally#

In some cases, we may wish to export a model designed in Dataiku so that it can be run on an external system — for example, to embed the model on an edge device, such as a drone. However, even if model scoring is done externally, model monitoring can still be done within Dataiku.

As a means of comparison to other deployment contexts, this article presents how to monitor a model within Dataiku in situations where the model is exported to Java and scored externally.

Objectives#

In this tutorial, you will:

  • Create a model monitoring feedback loop inside Dataiku using a Java model export that is scored externally.

Prerequisites#

In addition to the prerequisites laid out in the introduction, you’ll also need:

  • Basic knowledge of Java.

  • A JDK installed on your machine.

  • A Dataiku instance with the Java model export feature activated (limited to Enterprise level licenses).

  • We used the IntelliJ IDEA Community edition, but other IDEs are possible.

Export the model#

For scoring to occur outside of Dataiku, we’ll first need to export the model used in the starter project.

  1. Double click to open the saved model deployed to the Flow.

  2. Open the report for the Active version by clicking on the model name Random forest (s1) - v2 at the top left of the tile.

  3. At the top right of the screen, click Actions > Export model as ….

  4. Select the Java panel, and enter the Class name com.dataiku.sample.GreatModel.

  5. With the default full format selected, click OK.

Dataiku screenshot of the dialog for exporting a Java model.

Note

See the reference documentation on Java model exports to understand the usage and limitations.

Run the model outside of Dataiku#

Now you have downloaded on your machine a .jar file containing the model and Dataiku Java scoring libraries. To run the model outside of Dataiku, we need a Java program that loads the model, takes input data from a folder, scores it, and exports the result as a CSV file.

Import the project to an IDE#

We have provided a sample Java project in the monitoring-java folder of the academy-samples git repository that meets the above requirements.

  1. Import this project into your favorite IDE. (We used Intellij IDEA Community edition for this test).

The program is made of two files:

  • java/src/com/dataiku/exports/ModelPredictor.java loads the model and scores with it.

  • java/src/com/dataiku/exports/ModelRunner.java loads the input data, sends it for scoring, and saves the output as a CSV file.

The input data is expected within the project in the java/input folder. We’ve already provided a sample file, but you could generate any file you want, provided you respect the schema.

IDE screenshot of the Java project after importing it.

Add the exported jar file as a library to the project#

If using IntelliJ IDEA, the only missing part is the model itself and the scoring library: in other words, the .jar file previously exported from Dataiku.

  1. Add the .jar file as a library in the project.

IDE screenshot of the jar file as a library.

Run the program#

Now, you just need to run the program to generate the prediction log.

  1. In Intellij, right-click on ModelRunner.

  2. Select Run ‘ModelRunner.main()’.

IDE screenshot of the model runner dialog.

This action should create a CSV file of scored data called pokemon_scored_java.csv. The full path will be java/output/java/pokemon_scored_java.csv. This file is the only thing we need to compute drift monitoring in Dataiku.

Import prediction logs for monitoring#

We now have used the model to make predictions and exported these predictions in a usable format. Next we need to import the prediction data into Dataiku so that it can be monitored in the usual way.

  1. In the Java Monitoring Flow zone, open the predictions_java dataset.

  2. In the Settings tab, delete the existing CSV file, and replace it with the pokemon_scored_java.csv file that you just created.

Dataiku screenshot of the settings tab of a dataset of predictions scored with Java.

Like the other monitoring zones in this project, the Java Monitoring Flow zone includes:

  • An Evaluate recipe with two inputs: the saved model and a dataset of prediction logs (this time created externally with Java).

  • A model evaluation store as output computing standard drift metrics between the training dataset of the model and the actual predictions (this time done externally with Java).

Finally, let’s build the model evaluation store to check the drift of the input data and predictions computed externally with Java.

  1. Select the Monitoring - Java export model evaluation store.

  2. In the Actions tab, click Build.

  3. With the default not-recursive mode chosen, click Build Model Evaluation Store.

  4. When finished building, open the MES to find the same set of metrics you’d find if you built the MES in other Flow zones.

Dataiku screenshot of a model evaluation store on data scored externally with Java.

Automate model monitoring#

At this point, you have seen an example of how a model export can generate a log file usable to compute monitoring metrics.

In a real use case, the first point to solve is how to automatically move the prediction file from where it is generated to a place accessible to Dataiku. Possible solutions include sending it via FTP or directly pushing it to cloud storage. Rather than an all-around solution, this problem should be analyzed on a case-by-case basis.

Once you have configured the retrieval of logs in an accessible place, you can create a simple scenario to run the Evaluate recipe and generate a model evaluation, which you can then enrich with checks to automate alerts as done in this project’s Monitor batch job scenario or explained in more detail in this tutorial.

Automate model deployment (optional)#

It is perfectly acceptable to keep the deployment of the model as a manual process, as deploying a new model version might not be a very frequent operation. However, if you want to do so, you can build a full deployment pipeline by leveraging Dataiku APIs, typically by training a new model version and then downloading the jar file.

Note

See the Developer Guide for a more detailed example that you can adapt for your needs.

What’s next?#

Having seen the monitoring setup for an exported Java model scored externally, you might want to move on to one of the following monitoring cases that reuse the same project. They can be completed in any order independently of each other.