Concept | Metrics#

Metrics are measurements on datasets, folders, models, or model evaluation stores. They allow us to monitor the current status and evolution of these Dataiku items. For example, we could compute:

  • The number of records in a dataset

  • The size of a folder

  • The accuracy of a model

Examples of metrics and checks used on managed folders or saved models.

Metrics on datasets#

The Metrics tab within datasets includes several default metrics, such as the column count and record count. You can also edit the metrics to include:

  • Column statistics such as sum, average, minimum, maximum, etc.

  • Most frequent values of columns, such as the mode or top N values

  • Column percentiles

  • Data validity, or checking that the records match with a column’s meaning. (Note that the meaning must be locked, or manually selected, on the dataset, for this metric to be applied.)

  • Custom metrics calculated using a formula or code

Because there can be a lot of available metrics on an item, you can select the metrics you want to add to the displayed screen tiles on the Metrics page.

Examples of metrics and checks used on managed folders or saved models.

Note

You can also create custom metrics using a Python probe or SQL probe. For more information and examples, see the reference documentation and Developer Guide.

Metrics on other Flow objects#

In addition to datasets, you can also set up metrics on other Flow objects:

  • Folders

  • Models

  • Model evaluation stores

For example, you can compute metrics in the Status tab of a managed folder to see the count of included files and size of the folder.

Examples of metrics on a managed folder.

Monitoring metrics#

To monitor the value of a metric, and ensure that it always abides by certain quality rules — for example, to ensure it either falls within a numeric range, or falls within a set of values, you can use metrics in combination with checks or data quality rules.

For datasets:

  • If you are using Dataiku 12.6 or above, you can use data quality rules. (Note that data quality rules don’t need to be built on top of metrics, but they can be.)

  • For Dataiku versions of 12.5 and below, you can set up checks.

For folders, models, or model evaluation stores:

  • For all versions of Dataiku, use checks.