Concept | Top N recipe#

Watch the video

The Top N recipe allows you to filter a dataset based on the top and bottom values of some of its rows.

Use case#

You can use Top N to solve business use cases that involve the top and bottom rows of a dataset or the top and bottom rows of groupings of a dataset. For example, after building a prediction model, you might want to isolate just the top and bottom predictions so that you can set up specific marketing campaigns for each group.

Your results will depend on how you configure the sort in the tool. For example, if you sort in descending order, the highest values will be at the top.

Slide depicting how table of data extracts top and bottom rows.

There are additional configuration options in the configuration panel of the recipe for advanced use cases.

It might be advantageous to retrieve rows from the whole dataset or from groups of rows, such as country. That way, if your dataset contains a revenue prediction by customer as well as the customer’s country, you could use Top N to retrieve the 10 most promising customers for each country.

Top N configuration#

Let’s see how we can configure the Top N recipe in Dataiku.

Screenshot of the Top N recipe in Dataiku.
  1. Specify the number of rows you want Dataiku to retrieve.

  2. Define the dataset columns that you want to use and set the sort order.

  3. Choose whether or not to retrieve the top and bottom rows for groupings of the dataset.

  4. Optionally select to compute extra columns for each row.

Note

Count of rows in its group will create a column that counts the number of rows originally present in your subgroups.

Row number within its group will create a column of the unique row number for each row within your subgroups.

Rank of row within its group will create a column containing the rank of each row within your subgroups.

Dense rank of row within its group will create a column containing a dense rank of each row.