An exported Python model scored externally#
In some cases, we may wish to export a model designed in Dataiku so that it can be run on an external system — for example, within a more complete pipeline where other tasks are performed before and after the call to the model is done. However, even if model scoring is done externally, model monitoring can still be done within Dataiku.
As a means of comparison to other deployment contexts, this section presents how to monitor a model within Dataiku in situations where the model is exported to Python and scored externally.
Additional prerequisites#
In addition to the shared prerequisites, you’ll also need:
Basic knowledge of Python.
Python 3 on your machine.
A Dataiku instance with the Python model export feature activated.
Export the model#
For scoring to occur outside of Dataiku, we’ll first need to export the model used in the starter project.
Double click to open the saved model deployed to the Flow.
Open the report for the Active version by clicking on the model name Random forest (s1) - v3 at the top left of the tile.
At the top right of the screen, click Actions > Export model as ….
Select the Python panel.
Click Export Model.
Note
See the reference documentation on Python model exports to understand the requirements, usage, and limitations.
Run the model outside of Dataiku#
This action downloaded onto your machine a zip file containing the components needed for scoring directly with Python outside of Dataiku.
Inspect the downloaded package#
Let’s take a look at what Dataiku has provided.
Unzip the downloaded package to find:
File name |
Contents |
---|---|
|
The exported model |
|
The Python module required to run the model |
|
A sample script for making predictions |
Set up the environment#
The next step is to check that your environment and downloaded model is ready by running a sample script.
On the terminal, navigate to the directory holding these files.
Create a virtual environment for your tests.
virtualenv python-export
Activate that environment.
source python-export/bin/activate
You’ll need to make two small adjustments for the sake of this tutorial.
Open the
requirements.txt
file, and remove the specific version requirements on the dataiku-scoring package.Add
pandas
as a second requirement. (This isn’t mandatory for scoring, but will be used in our Python script later).Load the requirements file.
pip install -r requirements.txt
Once that is setup, call the sample script to validate the environment.
python sample.py
This should output the following:
Output of model.predict():
array(['fire', 'fire'], dtype=object)
Output of model.predict_proba():
{'bug': array([0.02705888, 0.0306321 ]),
'dark': array([0.05454764, 0.03727579]),
'dragon': array([0.07957995, 0.00496544]),
'electric': array([0.06280624, 0.06476114]),
'fairy': array([0.02217147, 0.03600387]),
'fighting': array([0.05453975, 0.06410458]),
'fire': array([0.15311388, 0.24131331]),
'flying': array([0.0058496 , 0.00308777]),
'ghost': array([0.04494048, 0.029513 ]),
'grass': array([0.1031577, 0.1232584]),
'ground': array([0.04200412, 0.02563218]),
'ice': array([0.03195237, 0.03471062]),
'normal': array([0.03372282, 0.0405713 ]),
'poison': array([0.04058422, 0.06011815]),
'psychic': array([0.04955909, 0.06700692]),
'rock': array([0.05377793, 0.0422824 ]),
'steel': array([0.04674354, 0.00999445]),
'water': array([0.09389033, 0.08476859])}
Export data for scoring#
Now that we’ve verified a working Python model, we’ll also need a batch of data prepared for scoring. We already have this in the Dataiku project.
Return to your project in Dataiku.
In the Design Flow zone, select the pokemon_for_scoring dataset.
If it is empty, click Build > Build Dataset with the default Build Only This setting.
Otherwise, click Export in the Actions tab.
Click Download to import a local CSV file on your machine of data ready to be scored.
Move it to the directory containing
model.zip
.
Score data with a Python script#
Now that we have a Python model and data for scoring, let’s make a script to generate predictions for that data using the model.
Create a file called
scoring_dataiku.py
with the following code:from dataikuscoring import load_model import pandas as pd # Load the model from current export path model = load_model('model.zip') input_df = pd.read_csv('pokemon_for_scoring.csv') predict_result = model.predict(input_df) output_df = input_df output_df['prediction'] = predict_result print(" Output of model.predict(): {}".format(output_df)) output_df.to_csv('pokemon_scored_python.csv', index=False)
Move the
scoring_dataiku.py
file to the directory containingmodel.zip
andpokemon_for_scoring.csv
.Generate predictions on the entire dataset by running:
python scoring_dataiku.py
This action should create the following output and a CSV file called pokemon_scored_python.csv.
Import prediction logs for monitoring#
We now have used the model to make predictions and exported these predictions in a usable format. Next we need to import the prediction data into Dataiku so that it can be monitored in the usual way.
In the Python Monitoring Flow zone, open the predictions_python dataset.
Navigate to the Settings tab.
Delete the existing CSV file.
Click Select Files to replace it with the pokemon_scored_python.csv file that you just created.
Click Save to confirm.
Compute a model evaluation#
Like the other monitoring zones in this project, the Python Monitoring Flow zone includes:
An Evaluate recipe with two inputs: the saved model and a dataset of prediction logs (this time created externally with Python).
A model evaluation store as output computing standard drift metrics between the training dataset of the model and the actual predictions.
Let’s build the model evaluation store to check the drift of the input data and predictions computed externally with Python.
In the Python Monitoring Flow zone, open the empty Monitoring - Python Export model evaluation store.
In the Actions tab, click Build > Build Evaluation Store with the default Build Only This setting.
When finished building, refresh the page to find the same set of metrics you’d find if you built the MES in other Flow zones.
Automate model monitoring#
At this point, you have seen an example of how a model export can generate a log file usable to compute monitoring metrics.
In a real use case, the first point to solve is how to automatically move the prediction file from where it is generated to a place accessible to Dataiku. Possible solutions include sending it via FTP or directly pushing it to cloud storage. Rather than an all-around solution, this problem should be analyzed on a case-by-case basis.
Once you have configured the retrieval of logs in an accessible place, you can create a simple scenario to run the Evaluate recipe and generate a model evaluation, which you can then enrich with checks to automate alerts as done in this project’s Monitor batch job scenario or explained in more detail in Tutorial | Model monitoring with a model evaluation store.
Automate model deployment (optional)#
It is perfectly acceptable to keep the deployment of the model as a manual process, as deploying a new model version might not be a very frequent operation. However, if you want to do so, you can build a full deployment pipeline by leveraging Dataiku APIs, typically by training a new model version and then downloading the jar file.
Note
See the Developer Guide for a more detailed example that you can adapt for your needs.