Concept | Distinct recipe#

Watch the video

The Distinct recipe allows you to filter a dataset in order to remove some of its duplicate rows.

Use case#

In the following example, we have a simple dataset that gives us information about products, their categories and their prices. To output a dataset that contains only unique rows, we apply the default configuration of the Distinct recipe.


Operation mode#

The Distinct Recipe has two configuration options:

  • Remove duplicates. This lets you identify all rows that have the exact same values on all columns and keep only one of them.

  • Find distinct values of a subset of all columns. This allows you to choose a sample of the available columns in order to compute the distinct rows related to their combinations. For example, if you select two columns, the resulting dataset will have exactly two columns.

A screenshot of the Operation mode options for the Distinct recipe.