Hands-On: Metrics & Checks

A data science project is never finished. Datasets need to be updated. Models need to be rebuilt. In order to put such a project into production, it is essential to have tools that can track the evolution of objects like datasets and models. Key to achieving this process in Dataiku DSS are metrics, checks, and scenarios.

  • In this hands-on lesson, we’ll focus on understanding how to use metrics and checks in Dataiku DSS in order to monitor the status of datasets and models.

  • In a later lesson, we’ll demonstrate how to use metrics and checks inside of a scenario in order to safely automate workflows.

  • Finally, we’ll look at ways to customize metrics, checks, and scenarios with code.

Prerequisites

This hands-on lesson assumes that you have a basic level of comfort using Dataiku DSS, as well as knowledge of variables. If not already on the Advanced Designer learning path, completing the Core Designer certificate is recommended.

Also, you must have the following three plugins installed on your Dataiku DSS instance.

These plugins are available through the Dataiku Plugin store, and you can find the instructions for installing plugins in the reference documentation. To check whether the plugin is already installed on your instance, go to the Installed tab in the Plugin Store to see a list of all installed plugins.

../../_images/automation-plugins.png

Create Your Project

You can get started with this project in three different ways.

Continue from the Previous Course

You can begin this lesson by continuing with the same project you built in the Plugin Store course.

If you have not yet started that course, you have two further options.

Import a New SQL-based Project

Those with a PostgreSQL connection on their instance have the option to import the SQL-based version of this project from the Dataiku homepage.

  • Click +New Project > DSS Tutorials > Advanced Designer > Automation (SQL-based Tutorial).

Because the imported project uses datasets that are connected to a PostgreSQL connection named postgresql, you will get some connection-related errors. See this article on Connection Remapping for help resolving them.

Note

The screenshots below use the SQL-based project.

Import a New File-based Project

If you do not have a PostgreSQL connection on your Dataiku DSS instance, and do not wish to create one, you can import the file-based version of this project from the Dataiku homepage.

  • From the Dataiku homepage. Click +New Project > DSS Tutorials >Advanced Designer > Automation (File-based Tutorial).

Build Your Project

Your starter project only has the skeleton of the Flow. The datasets have not yet been built.

../../_images/starting-flow.png

Let’s build the parts we need.

  • From the Flow, select the two output datasets relevant for this tutorial: merchants_with_tract_income and transactions_unknown_scored.

  • With both datasets selected, choose Build from the Actions sidebar.

  • By default, the option “Build required dependencies” should be chosen. Click Preview to view the suggested job.

  • Now in the Jobs tab, we can see all of the activities Dataiku DSS will perform.

  • Click Run and observe how Dataiku DSS progresses through the list of activities.

../../_images/job-preview.png

Note

Dataiku DSS may issue a warning about a possible quoting issue in an unusually large column. This warning is generated in response to the product_description column. As we are not working with natural language data here, we can safely ignore this warning.

When the job finishes, note that the project has a variable state_name defined as “Delaware” (see the Variables tab of the More Options menu). This variable gets used in the Group recipe that creates the merchants_by_state dataset. Accordingly, the only value for merchant_state in the merchants_with_tract_income dataset is “Delaware”. We’ll see how to change this with a scenario later.

Default Metrics

Go back to the Flow. A key dataset in the Flow is transactions_joined_prepared. Let’s establish some metrics and checks for this dataset in order to monitor its status.

On the Status tab of every dataset in Dataiku DSS, we find a few default metrics depending on the type of dataset.

  • Open transactions_joined_prepared and navigate to the Status tab.

  • On the Metrics subtab, we can control which metrics to display and in what format. Two metrics in this case, Column count and Record count, are displayed by default.

  • Click Compute to calculate all of the displayed metrics.

../../_images/default-metrics.png

Create Your Own Metrics

Now let’s take further control of our metrics. Navigate to the Edit subtab.

We can see that both “Column count” and “Record count” are switched “ON”. Only “Column count”, however, is set to auto-compute after build (for the SQL-based tutorial).

../../_images/auto-compute.png

Now, let’s create a new metric from the available options.

  • Turn on the Column statistics section. Here we can create metrics out of basic statistics like the minimum, maximum, or count of any column in the dataset.

By definition, FICO scores range from 300 to 850.

  • With this range in mind, let’s track the Min and Max of card_fico_score.

  • Also, select the Distinct value count of merchant_state.

  • From this screen, run the probe to compute the three requested metrics.

../../_images/col-stats-metrics.png

The next section of metrics allows us to compute the most frequent values of a column.

  • Turn on Most frequent values.

  • Select the Mode of merchant_subsector and the Top 10 values of merchant_state.

  • Click to run this probe as well and see the last run results.

../../_images/most-frequent-values.png

In addition to column statistics, we can also retrieve cell values as metrics. Imagine that we want to be alerted if any transaction from one of our largest customers is not authorized. We can create this kind of metric with a cell value probe.

  • From the bottom of the screen, create a New Cell Value Probe.

  • With the Filter enabled, keep all rows satisfying the conditions:

    • card_id equals C_ID_2d1fec14d8 (the card_id of our large customers).

    • authorized_flag equals 0.0 (those failing authorization).

  • The output of the filter should be a First row matching the filter.

  • Select the authorized_flag column.

  • Save the metric. Click to run it now.

../../_images/cell-value-probe.png

The cell value probe should return “No data” because no transaction from this customer has failed authorization. We could verify this with the present dataset using a Filter recipe.

Note

In addition to these kinds of built-in metrics, we can also define custom metrics with Python, SQL, or plugins. We’ll explore these in another lesson.

When returning to the Metrics subtab, we find our newly-created metrics as options available to be displayed.

  • Click on the button showing the number of metrics currently displayed, and Add all.

  • Click Compute to calculate the metrics if not already done so.

  • Experiment with different views by changing the Display from “Last value”, to “History”, to “Columns”.

../../_images/computed-metrics.png

Create Your Own Checks

Now let’s use these metrics to establish checks on the dataset.

  • Navigate to the Checks subtab.

  • None exist yet so create one in the Edit tab.

By definition, we know FICO scores range from 300 to 850. If a value is outside of this range, we will assume there must be some kind of data quality problem, and we want a failure notice.

  • Under “Add a new check”, select Metric value is in a numeric range.

  • Name it FICO >= 300.

  • Choose Min of card_fico_score as the metric.

  • Set the Minimum to 300.

  • Click Check to see how it will work. It should return “OK”, and the message 300. We could have found the same result in the Analyze tool.

Note

When we check if a metric is in a numeric range, we have the ability to define a hard or soft maximum or minimum, depending on whether we want to trigger an error or a warning.

  • A value less than the minimum or greater than the maximum value produces an error.

  • A value less than the soft minimum or greater than the soft maximum produces a warning.

Let’s test this out on the same check.

  • Change the minimum of the “FICO >= 300” check from 300 to 320 and run the check. Instead of returning “OK”, it returns an error because we have values in the dataset less than 320.

  • Reset the minimum to 300 and add a soft minimum of 320 (a very risky credit score). Now the check returns “WARNING”.

../../_images/fico-above-300.png

We can follow a similar process to check the upper bound.

  • Under “Add a new check”, select Metric value is in a numeric range.

  • Name it FICO <= 850.

  • Choose Max of card_fico_score as the metric.

  • Set the Maximum to 850.

  • Click Check to confirm it is working as intended.

Assuming that all of these transactions are from the United States, we know that there should not be more than 51 distinct values of merchant_state (including the District of Columbia as a state). We can create a check to monitor this.

  • Under “Add a new check”, select Metric value is in a numeric range.

  • Name it Valid merchant_state.

  • Choose Distinct value count of merchant_state as the metric.

  • Set the Maximum to 51.

  • After running the check, it should return “OK”.

We can also check if a metric is within a set of categorical values. For example, our domain knowledge might create the expectation that the most frequent merchant_subsector should be “gas”. Let’s make this a check.

  • Under “Add a new check”, select Metric value is in a set of values.

  • Name it 'gas' is subsector mode.

  • Choose Mode of merchant_subsector as the metric.

  • Add gas as the value.

  • After running the check, it should return “OK”.

After saving the new checks, navigate from the Edit to the Checks tab.

  • Display all of the newly-created Checks.

  • Click Compute.

../../_images/computed-checks.png

Note

We can also create custom checks with Python or plugins. We’ll see this in another lesson.

Model Metrics & Checks

Datasets are not the only Dataiku DSS object for which we can monitor metrics and checks. Models are another object in need of close monitoring.

In the Flow, the green diamond represents the deployed model that predicts whether a credit card transaction will be authorized or not.

  • Select it and Retrain it (non-recursively) from the Actions sidebar.

On opening the deployed model, we can see the active and previous versions. Note how the ROC AUC, or AUC, a common performance metric for classification models, is around 0.75.

../../_images/model-versions.png

We can track this metric, along with many other common indicators of model performance.

  • Navigate to the Metrics & Status tab.

  • On the View subtab, click on the Display to show the built-in model metrics available.

  • Add and save AUC to the list of metrics to display. Other kinds of performance metrics, such as accuracy, precision and recall, are also available.

../../_images/model-metrics.png

Now let’s create a check to monitor this metric.

  • Navigate to the Settings tab.

  • Under the Status checks subtab, add a new check for Metric value is in a numeric range.

  • Name it 0.60 >= AUC >= 0.95.

  • Select AUC as the metric to check.

  • Set a minimum to 0.6 and a maximum of 0.95 to throw an error if the model performance has either degraded or become suspiciously high.

  • Set a soft minimum of 0.65 and a soft maximum of 0.9 to warn us if the performance of the deployed model has decreased or increased.

  • Run the check to see if it is working.

../../_images/model-check.png

Note

This AUC range is just an example. For your own use case, an allowable deviation in AUC may be very different.

Return to the Metrics & Status tab and add the new check to the Display.

../../_images/model-checks-computed.png

Now we have covered the basic functionality of built-in metrics and checks for datasets and models in Dataiku DSS. In a later lesson, we’ll introduce how to write our own metrics and checks with code and plugins.

What’s next?

For more information, please visit the reference documentation sections on metrics and checks.

Metrics and checks can be useful all on their own. However, their value increases even further when teamed with scenarios.

We’ll see how that works in the next section.