Tutorial | Monitoring models: An API endpoint on a Dataiku API node#
Many data science workloads call for a real-time API framework, where queries sent to an API endpoint receive an immediate response.
As a means of comparison to other deployment contexts, this article presents how to monitor a model under a real-time API framework staying entirely within Dataiku.
In this tutorial, you will:
Create a model monitoring feedback loop on an API endpoint deployed on a Dataiku API node.
In addition to the prerequisites laid out in the introduction, you’ll also need:
Deploy the model as an API endpoint#
The starter project actually already contains the API endpoint that we want to monitor, and so the next step is pushing a version of an API service including the endpoint to the API Deployer.
From the top navigation bar, navigate to the API Designer from within the More Options menu.
Open the pokemon API service.
Note how it includes one prediction endpoint called guess using the model found in the Flow.
Click Publish on Deployer, and OK to confirm publishing v1 of the service to the API Deployer.
Once on the API Deployer, we can actually deploy the service to an infrastructure.
On the API Deployer, find the API version that you just pushed to the API Deployer, and click Deploy.
Select the configured API infrastructure, click Deploy, and again to confirm.
To review the mechanics of real-time API deployment in greater detail, please see the Tutorial | Real-time API basics.
Generate activity on the API endpoint#
Before we set up the monitoring portion of this project, we need to generate some activity on the API endpoint so that we have actual data on the API node to retrieve in the feedback loop.
When viewing the deployment on the API Deployer, navigate to the Run and test tab for the guess endpoint.
Click Run All to send several test queries to the API node.
Create a feedback loop on the API endpoint#
Now direct your attention to the Dataiku Monitoring (API) Flow zone. Just like the batch Flow zone, we have an Evaluate recipe that takes two inputs (a dataset of predictions and a saved model) and outputs a model evaluation store. However, there are two subtle differences.
API node log data#
The input data in this context comes directly from the API node. We need to point the pokemon_on_static_api_logs dataset to the storage of the API endpoint prediction logs according to the Event server’s configuration. (An admin can find this information under Administration > Settings > Event Server on the Design node).
Open the pokemon_on_static_api_logs dataset. There will be a warning that it is empty.
Navigate to the Settings tab.
In the Files subtab, select the Read from connection specific to the configuration of your Event server.
Click Browse to navigate the file directory, and find the Path specific to the configuration of your Event server.
Click api-node-query, and then select the name of the API deployment for this project.
Click OK, and see a path ending with your API deployment.
Click List Files to observe which logs are available, and Save when ready.
If using Dataiku Cloud, you can access API query logs from the S3 connection customer-audit-logs within the path apinode-audit-logs.
After pointing this dataset to the correct prediction logs, we can now explore it. Each row is an actual prediction request answered by our model. You can find all the features that were requested, the resulting prediction, with details and other technical data.
Although we are showing a local filesystem storage for the API node logs to make the project import easier, in a real situation, any file-based cloud storage is highly recommended. This data can grow quickly, and it will not decrease unless explicitly truncated.
It would also be common to activate partitioning for this dataset.
The Evaluate recipe with API node logs as input#
Another subtle difference between the Evaluate recipe in the API Flow zone compared to the Batch Flow zone is the option to automatically handle the input data as API node logs.
With this activated (detected by default), you do not need to care about all the additional columns or the naming.
Observe the Settings tab of the Evaluate recipe in the Dataiku Monitoring (API) Flow zone.
Click Run to produce a model evaluation of the API node logs.
If using a version of Dataiku prior to 11.2, you will need to add a Prepare recipe to keep only the features and prediction columns, and rename them to match the initial training dataset convention.
Create a one-click monitoring loop#
This feature is not available for Dataiku Cloud users. Refer to the instructions above for setting up this feedback loop.
After understanding these details, you should also be aware that since version 12, users can simplify this process by building the entire feedback loop directly from the API endpoint in the API Designer.
On the Design node, navigate to the API Designer from the More options menu of the top navigation bar.
Open the pokemon API service, and click on the Monitoring panel for the guess endpoint.
Click Configure to create a monitoring loop for this endpoint.
Click OK, and then return to the Flow to see the new zone, which, in this case, duplicates the work of the existing Dataiku Monitoring (API) Flow zone.
Having seen the monitoring setup for an API endpoint on a Dataiku API node, you might want to move on to one of the following monitoring cases that reuse the same project. They can be completed in any order independently of each other.