Hands-On: Metrics & Checks (Part 1)

A data science project is never finished. Datasets need to be updated. Models need to be rebuilt. In order to put such a project into production, it is essential to have tools that can track the evolution of objects like datasets and models. Key to achieving this process in Dataiku DSS are metrics, checks, and scenarios.

This hands-on tutorial has three parts:

  • In this part, we’ll focus on understanding how to use metrics and checks in Dataiku DSS in order to monitor the status of datasets and models.

  • In the second part, we’ll demonstrate how to use metrics and checks inside of a scenario in order to safely automate workflows.

  • In the third part, we’ll look at ways to customize metrics, checks, and scenarios with code.

Prerequisites

This hands-on lesson assumes that you have a basic level of comfort using Dataiku DSS, as well as knowledge of variables.

Note

If not already on the Advanced Designer learning path, completing the Core Designer certificate is recommended.

You’ll need access to an instance of Dataiku DSS (version 8.0 or above) with the following plugins installed:

These plugins are available through the Dataiku Plugin store, and you can find the instructions for installing plugins in the reference documentation. To check whether the plugin is already installed on your instance, go to the Installed tab in the Plugin Store to see a list of all installed plugins.

../../_images/automation-plugins.png

Tip

Two notes for users of Dataiku Online:

  • The process for installing a plugin is slightly different.

    • From your instance launchpad, open to the Features panel on the left hand side.

    • Click Add a Feature and choose “US Census” from the Extensions menu. (Reverse geocoding is already available by default).

  • The end of this tutorial includes a demonstration of custom Python triggers in scenarios, which is not available to Online users. All other parts of the tutorial, however, can be completed.

Create Your Project

Rather than starting from scratch, we’ll use an existing Flow.

  • Click +New Project > DSS Tutorials > Advanced Designer > Automation (Tutorial).

Note

You can also use a successfully completed project from the Plugin Store course.

Change Connections

Aside from the input datasets, all of the others are empty managed filesystem datasets.

You are welcome to leave the storage connection of these datasets in place, but you can also use another storage system depending on the infrastructure available to you.

To use another connection, such as a SQL database, follow these steps:

  • Select the empty datasets from the Flow. (On a Mac, hold Shift to select multiple datasets).

  • Click Change connection in the Other actions section of the Actions sidebar.

  • Use the dropdown menu to select the new connection.

  • Click Save.

../../_images/automation-change-connection.png

Note

Another way to select datasets is from the Datasets page (G+D). There are also programmatic ways of doing operations like this that you’ll learn about in the Developer learning path.

The screenshots below demonstrate using a PostgreSQL database.

Build Your Project

Your starter project only has the skeleton of the Flow. The datasets have not yet been built.

../../_images/starting-flow.png

Let’s build the parts we need for this tutorial.

  • From the Flow, select the two output datasets relevant for this tutorial:

    • merchants_with_tract_income

    • transactions_unknown_scored.

  • With both datasets selected, choose Build from the Actions sidebar.

  • By default, the option “Build required dependencies” should be chosen. Click Preview to view the suggested job.

  • Now in the Jobs tab, we can see all of the activities Dataiku DSS will perform.

  • Click Run and observe how Dataiku DSS progresses through the list of activities.

../../_images/job-preview.png

Note

Dataiku DSS issues a warning about a possible quoting issue in an unusually large column. This warning is generated in response to the product_description column. As we are not working with natural language data here, we can safely ignore this warning.

When the job finishes, note that the project has a variable state_name defined as “Delaware” (see the Variables tab of the More Options menu). This variable gets used in the Group recipe that creates the merchants_by_state dataset. Accordingly, the only value for merchant_state in the merchants_with_tract_income dataset is “Delaware”. We’ll see how to change this with a scenario later.

Default Metrics

A key dataset in the Flow is transactions_joined_prepared. Let’s establish some metrics and checks for this dataset in order to monitor its status.

On the Status tab of every dataset in Dataiku DSS, we find a few default metrics depending on the type of dataset.

  • Open transactions_joined_prepared and navigate to the Status tab.

  • On the Metrics subtab, we can control which metrics to display and in what format. Two metrics in this case, Column count and Record count, are displayed by default.

  • Click Compute to calculate all of the displayed metrics if not already done so.

../../_images/default-metrics.png

Create Your Own Metrics

Now let’s take further control of our metrics. Navigate to the Edit subtab.

We can see that both “Column count” and “Record count” are switched ON. Only “Column count”, however, is set to auto-compute after build (although this may differ based on your chosen storage connection).

../../_images/auto-compute.png

Now, let’s create a new metric from the available options.

  • Turn on the Column statistics section. Here we can create metrics out of basic statistics like the minimum, maximum, or count of any column in the dataset.

By definition, FICO scores, a system of scoring credit worthiness, range from 300 to 850.

  • With this range in mind, let’s track the Min and Max of card_fico_score.

  • Also, select the Distinct value count of merchant_state.

  • From this screen, run the probe to compute the three requested metrics.

../../_images/col-stats-metrics.png

The next section of metrics allows us to compute the most frequent values of a column.

  • Turn on Most frequent values.

  • Select the Mode of merchant_subsector and the Top 10 values of merchant_state.

  • Click to run this probe as well and see the last run results.

../../_images/most-frequent-values.png

In addition to column statistics, we can also retrieve cell values as metrics. Imagine that we want to be alerted if any transaction from one of our largest customers is not authorized. We can create this kind of metric with a cell value probe.

  • From the bottom of the screen, create a New Cell Value Probe.

  • With the Filter enabled, keep all rows satisfying the conditions:

    • card_id equals C_ID_2d1fec14d8 (the card_id of one large customer).

    • authorized_flag equals 0.0 (those failing authorization).

  • The output of the filter should be a First row matching the filter.

  • Select the authorized_flag column.

  • Save the metric. Click to run it now.

../../_images/cell-value-probe.png

The cell value probe should return “No data” because no transaction from this customer has failed authorization. We could verify this with the present dataset using a Filter recipe.

Note

In addition to these kinds of built-in metrics, we can also define custom metrics with Python, SQL, or plugins. We’ll explore these in another lesson.

When returning to the Metrics subtab, we find our newly-created metrics as options available to be displayed.

  • Click on the button showing the number of metrics currently displayed, and Add all.

  • Click Compute to calculate the metrics if not already done so.

  • Experiment with different views by changing the Display from “Last value”, to “History”, to “Columns”.

../../_images/computed-metrics.png

Create Your Own Checks

Now let’s use these metrics to establish checks on the dataset.

  • Navigate to the Checks subtab.

  • None exist yet so create one in the Edit tab.

By definition, we know FICO scores range from 300 to 850. If a value is outside of this range, we will assume there must be some kind of data quality problem, and we want a failure notice.

  • Under “Add a new check”, select Metric value is in a numeric range.

  • Name it FICO >= 300.

  • Choose Min of card_fico_score as the metric.

  • Set the Minimum to 300.

  • Click Check to see how it will work. It should return “OK”, and the message 300. We could have found the same result in the Analyze tool.

Note

When we check if a metric is in a numeric range, we have the ability to define a hard or soft maximum or minimum, depending on whether we want to trigger an error or a warning.

  • A value less than the minimum or greater than the maximum value produces an error.

  • A value less than the soft minimum or greater than the soft maximum produces a warning.

Let’s test this out on the same check.

  • Change the minimum of the “FICO >= 300” check from 300 to 320 and run the check. Instead of returning “OK”, it returns an error because we have values in the dataset less than 320.

  • Reset the minimum to 300 and add a soft minimum of 320 (a very risky credit score). Now the check returns “WARNING”.

../../_images/fico-above-300.png

We can follow a similar process to check the upper bound.

  • Under “Add a new check”, select Metric value is in a numeric range.

  • Name it FICO <= 850.

  • Choose Max of card_fico_score as the metric.

  • Set the Maximum to 850.

  • Click Check to confirm it is working as intended.

Assuming that all of these transactions are from the United States, we know that there should not be more than 51 distinct values of merchant_state (including the District of Columbia as a state). We can create a check to monitor this.

  • Under “Add a new check”, select Metric value is in a numeric range.

  • Name it Valid merchant_state.

  • Choose Distinct value count of merchant_state as the metric.

  • Set the Maximum to 51.

  • After running the check, it should return “OK”.

We can also check if a metric is within a set of categorical values. For example, our domain knowledge might create the expectation that the most frequent merchant_subsector should be “gas”. Let’s make this a check.

  • Under “Add a new check”, select Metric value is in a set of values.

  • Name it 'gas' is subsector mode.

  • Choose Mode of merchant_subsector as the metric.

  • Add gas as the value.

  • After running the check, it should return “OK”.

After saving the new checks, navigate from the Edit to the Checks tab.

  • Display all of the newly-created Checks.

  • Click Compute.

../../_images/computed-checks.png

Note

We can also create custom checks with Python or plugins. We’ll see this in another lesson.

Model Metrics & Checks

Datasets are not the only Dataiku DSS object for which we can monitor metrics and checks. Models are another object in need of close monitoring.

In the Flow, the green diamond represents the deployed model that predicts whether a credit card transaction will be authorized or not.

  • Select it and Retrain it (non-recursively) from the Actions sidebar.

On opening the deployed model, we can see the active and previous versions. Note how the ROC AUC, or AUC, a common performance metric for classification models, is around 0.75.

../../_images/model-versions.png

We can track this metric, along with many other common indicators of model performance.

  • Navigate to the Metrics & Status tab.

  • On the View subtab, click on the Display to show the built-in model metrics available.

  • Add and save AUC to the list of metrics to display. Other kinds of performance metrics, such as accuracy, precision and recall, are also available.

../../_images/model-metrics.png

Now let’s create a check to monitor this metric.

  • Navigate to the Settings tab.

  • Under the Status checks subtab, add a new check for Metric value is in a numeric range.

  • Name it 0.60 >= AUC >= 0.95.

  • Select AUC as the metric to check.

  • Set a minimum to 0.6 and a maximum of 0.95 to throw an error if the model performance has either degraded or become suspiciously high.

  • Set a soft minimum of 0.65 and a soft maximum of 0.9 to warn us if the performance of the deployed model has decreased or increased.

  • Run the check to see if it is working. Save it.

../../_images/model-check.png

Note

This AUC range is just an example. For your own use case, an allowable deviation in AUC may be very different.

Return to the Metrics & Status tab and add the new check to the Display.

../../_images/model-checks-computed.png