An exported Python model scored externally#

Depending on the deployment context, you may wish to export a model designed in Dataiku so that it can run on an external system. For example, a model may need to run within a more complete pipeline including tasks before and after the model call. However, even if model scoring occurs externally, model monitoring can still occur within Dataiku.

This section presents how to monitor a model within Dataiku in situations where the model is exported to Python and scored externally.

Additional prerequisites#

In addition to the shared prerequisites, you’ll also need:

Export the model#

For scoring to occur outside of Dataiku, you’ll first need to export the model used in the starter project.

  1. Double click to open the saved model deployed to the Flow.

  2. Open the report for the Active version by clicking on the model name Random forest (s1) - v3 at the top left of the tile.

  3. At the top right of the screen, click Actions > Export model as ….

  4. Select the Python panel.

  5. Click Export Model.

Dataiku screenshot of the dialog for exporting a Python model.

See also

See the reference documentation on Python model exports to understand the requirements, usage, and limitations.

Run the model outside of Dataiku#

This action downloaded onto your machine a zip file containing the components needed for scoring directly with Python outside of Dataiku.

Inspect the downloaded package#

Let’s take a look at what Dataiku has provided.

  1. Unzip the downloaded package to find:

File name

Contents

model.zip

The exported model

requirements.txt

The Python module required to run the model

sample.py

A sample script for making predictions

Set up the environment#

The next step is to check that your environment and downloaded model is ready by running a sample script.

  1. On the terminal, navigate to the directory holding these files.

  2. Create a virtual environment for your tests.

    virtualenv python-export
    
  3. Activate that environment.

    source python-export/bin/activate
    

You’ll need to make two small adjustments for the sake of this tutorial.

  1. Open the requirements.txt file, and remove the specific version requirements on the dataiku-scoring package.

  2. Add pandas as a second requirement. (This isn’t mandatory for scoring, but will be used in our Python script later).

  3. Load the requirements file.

    pip install -r requirements.txt
    
  4. Once that’s setup, call the sample script to validate the environment.

    python sample.py
    

This should output the following:

Output of model.predict():
array(['fire', 'fire'], dtype=object)
Output of model.predict_proba():
{'bug': array([0.02705888, 0.0306321 ]),
 'dark': array([0.05454764, 0.03727579]),
 'dragon': array([0.07957995, 0.00496544]),
 'electric': array([0.06280624, 0.06476114]),
 'fairy': array([0.02217147, 0.03600387]),
 'fighting': array([0.05453975, 0.06410458]),
 'fire': array([0.15311388, 0.24131331]),
 'flying': array([0.0058496 , 0.00308777]),
 'ghost': array([0.04494048, 0.029513  ]),
 'grass': array([0.1031577, 0.1232584]),
 'ground': array([0.04200412, 0.02563218]),
 'ice': array([0.03195237, 0.03471062]),
 'normal': array([0.03372282, 0.0405713 ]),
 'poison': array([0.04058422, 0.06011815]),
 'psychic': array([0.04955909, 0.06700692]),
 'rock': array([0.05377793, 0.0422824 ]),
 'steel': array([0.04674354, 0.00999445]),
 'water': array([0.09389033, 0.08476859])}

Export data for scoring#

Now that you’ve verified a working Python model, you’ll also need a batch of data prepared for scoring. The starter project already includes this.

  1. Return to your project in Dataiku.

  2. In the Design Flow zone, select the pokemon_for_scoring dataset.

  3. If it’s empty, click Build > Build Dataset with the default Build Only This setting.

  4. Otherwise, click Export in the Actions tab.

  5. Click Download to import a local CSV file on your machine of data ready to be scored.

  6. Move it to the directory containing model.zip.

Dataiku screenshot of the dialog for exporting the data needed for scoring.

Score data with a Python script#

Now that you have a Python model and data for scoring, let’s make a script to generate predictions for that data using the model.

  1. Create a file called scoring_dataiku.py with the following code:

    from dataikuscoring import load_model
    import pandas as pd
    
    # Load the model from current export path
    model = load_model('model.zip')
    
    input_df = pd.read_csv('pokemon_for_scoring.csv')
    predict_result = model.predict(input_df)
    output_df = input_df
    output_df['prediction'] = predict_result
    print(" Output of model.predict(): {}".format(output_df))
    output_df.to_csv('pokemon_scored_python.csv', index=False)
    
  2. Move the scoring_dataiku.py file to the directory containing model.zip and pokemon_for_scoring.csv.

  3. Generate predictions on the entire dataset by running:

    python scoring_dataiku.py
    

    This action should create the following output and a CSV file called pokemon_scored_python.csv.

Terminal screenshot of output after running the scoring script.

Import prediction logs for monitoring#

You now have used the model to make predictions and exported these predictions in a usable format. Next, you need to import the prediction data into Dataiku so that you can monitor it in the usual way.

  1. In the Python Monitoring Flow zone, open the predictions_python dataset.

  2. Navigate to the Settings tab.

  3. Delete the existing CSV file.

  4. Click Select Files to replace it with the pokemon_scored_python.csv file that you just created.

  5. Click Save to confirm.

Dataiku screenshot of the settings tab of a dataset of predictions scored with Python.

Compute a model evaluation#

Like the other monitoring zones in this project, the Python Monitoring Flow zone includes:

  • An Evaluate recipe with two inputs: the saved model and a dataset of prediction logs (this time created externally with Python).

  • A model evaluation store as output computing standard drift metrics between the training dataset of the model and the actual predictions.

Let’s build the model evaluation store to check the drift of the input data and predictions computed externally with Python.

  1. In the Python Monitoring Flow zone, open the empty Monitoring - Python Export model evaluation store.

  2. In the Actions tab, click Build > Build Evaluation Store with the default Build Only This setting.

  3. When finished building, refresh the page to find the same set of metrics you’d find if you built the MES in other Flow zones.

Dataiku screenshot of a model evaluation store on data scored externally with Python.

Automate model monitoring#

At this point, you have seen an example of how a model export can generate a log file usable to compute monitoring metrics.

In a real use case, the first point to solve is how to automatically move the prediction file from where it’s generated to a place accessible to Dataiku. Possible solutions include sending it via FTP or directly pushing it to cloud storage. Rather than an all-around solution, this problem should be analyzed on a case-by-case basis.

Once you have configured the retrieval of logs in an accessible place, you can create a simple scenario to run the Evaluate recipe and generate a model evaluation, which you can then enrich with checks to automate alerts as done in this project’s Monitor batch job scenario or explained in more detail in Tutorial | Model monitoring with a model evaluation store.

Automate model deployment (optional)#

It’s perfectly acceptable to keep the deployment of the model as a manual process, as deploying a new model version might not be a frequent operation. However, if you want to do so, you can build a full deployment pipeline by leveraging Dataiku APIs, typically by training a new model version and then downloading the jar file.

Note

See the Developer Guide for a more detailed example that you can adapt for your needs.