Tutorial | Top N recipe¶
Let’s try out the Top N recipe to isolate the biggest purchases found in a practice dataset.
In this tutorial, you will:
Find the five most expensive purchases recorded in a dataset.
Find the five most expensive purchases per item category in the dataset.
To reproduce the steps in this tutorial, you’ll need:
Create the project¶
From the Dataiku Design homepage, click + New Project > DSS tutorials > Advanced Designer > Top N recipe.
From the project homepage, click Go to Flow.
You can also download the starter project from this website and import it as a zip file.
You’ll next want to build the Flow.
Click Flow Actions at the bottom right of the Flow.
Click Build all.
Keep the default settings and click Build.
We’ll create a Top N recipe from the tx_prepared dataset.
Select the tx_prepared dataset and click on the Top N recipe from the Actions tab.
Change the output name to
tx_topnand click Create Recipe.
To find the five largest purchases in a dataset:
5top rows, and
Select the purchase_amount column for sorting.
Run the recipe and open the output dataset.
As you can see, the output dataset consists of just five records including the most expensive purchases in the whole dataset. Let’s add a little more complexity.
Here, we’ll try a different example to find the five biggest purchases in the dataset per item category.
Reopen the Top N recipe.
Beneath from, select each group of rows identified by….
In the dropdown that appears, choose item_category as the key column.
Check the row number within its group checkbox.
To make the output easier to interpret:
Navigate to the Retrieve columns step.
Change the Mode to Select columns.
Run the recipe and then open the output dataset.
The output should have three columns: item_category, purchase_amount, and _row_number.
You’ll see that for each category — A, B, C, and D — there are five purchase amounts that decrease within their grouping. We can confirm from the _row_number values that each grouping has five values.