Tutorial | Top N recipe#

Let’s try out the Top N recipe to isolate the biggest purchases found in a practice dataset.

Get started#

Objectives#

In this tutorial, you will:

  • Find the five most expensive purchases recorded in a dataset.

  • Find the five most expensive purchases per item category in the dataset.

Prerequisites#

To reproduce the steps in this tutorial, you’ll need:

  • Access to an instance of Dataiku 12+.

  • Basic knowledge of Dataiku (Core Designer level or equivalent).

Create the project#

  1. From the Dataiku Design homepage, click + New Project > DSS tutorials > Advanced Designer > Top N Recipe.

  2. From the project homepage, click Go to Flow.

Note

You can also download the starter project from this website and import it as a zip file.

You’ll next want to build the Flow.

  1. Click Flow Actions at the bottom right of the Flow.

  2. Click Build all.

  3. Keep the default settings and click Build.

Create the Top N recipe#

We’ll create a Top N recipe from the tx_prepared dataset.

  1. Select the tx_prepared dataset and click on the Top N recipe from the Actions tab.

  2. Change the output name to tx_topn and click Create Recipe.

A screenshot of the "New topn recipe" dialogue window.

Find most expensive purchases#

To find the five largest purchases in a dataset:

  1. Retrieve the 5 top rows, and 0 bottom rows.

  2. Select the purchase_amount column for sorting.

  3. Change the sort to descending order Descending icon., so the most expensive orders appear at the top of the dataset.

  4. Run the recipe and open the output dataset.

A screenshot of the Top N step in the Top N recipe.

As you can see, the output dataset consists of just five records including the most expensive purchases in the whole dataset. Let’s add a little more complexity.

Group by item category#

Here, we’ll try a different example to find the five biggest purchases in the dataset per item category.

  1. Reopen the Top N recipe.

  2. Beneath from, select each group of rows identified by….

  3. In the dropdown that appears, choose item_category as the key column.

  4. Check the row number within its group checkbox.

A screenshot of the Top N step showing how to group top purchases by item category.

Retrieve columns#

To make the output easier to interpret:

  1. Navigate to the Retrieve columns step.

  2. Change the Mode to Select columns.

  3. Move all of the columns into the Available columns section using the Left-facing double arrow. double arrow.

  4. Move item_category and purchase_amount back to the Selected columns section using the Right-facing single arrow. single arrow.

  5. Run the recipe and then open the output dataset.

The output should have three columns: item_category, purchase_amount, and _row_number.

Screenshot of the output dataset showing top purchases grouped by item category.

You’ll see that for each category — A, B, C, and D — there are five purchase amounts that decrease within their grouping. We can confirm from the _row_number values that each grouping has five values.

What’s next?#

You just practiced using the Top N recipe to find the most expensive transactions in the dataset.

See also

For more information on this recipe, see also the Top N: retrieve first N rows article in the reference documentation.

To try out more visual recipes, visit our page on Visual Recipes!