Concept | Sort recipe#
Watch the video
The Sort recipe allows you to sort the rows of an input dataset by the values of one or more columns in the dataset.
Use case#
Let’s say that our dataset provides customer information and includes a revenue prediction column.
Our goal is to output a dataset sorted by revenue predictions for each country in descending order.
Sort configuration#
Here’s how we can configure the Sort recipe:
By default, the Sort recipe sorts columns in ascending order. In order to meet our business goal, we’ll change the sort option so that revenue predictions sort in descending order.
Additionally, we need to pay careful attention to the sort order. Here, we make sure that the data is sorted first by ip_country, and then by prediction.
Note
You can change the sort order by dragging and dropping the variable fields.
Finally, we can choose to make certain computations for each row. We’ll choose each option so we can see each output.
Output#
After running the recipe, our output dataset contains rows sorted first by the customer’s country of origin, then by the prediction of revenue.
In addition, Dataiku has appended three computed columns, which we will explain further.
The table below explains the three new columns.
Row |
Description |
---|---|
_row_number |
Contains each row’s respective row number. |
_rank |
Contains a row’s ranking based on its value in the sorting column(s). When there is a tie between rankings, subsequent rankings will skip ranks based on the number of ties there are. |
_dense_rank |
Contains the dense rank of each row. This is the same as the row’s ranking, but rankings are consecutive, as no ranks are skipped. |
You’ll be able to choose the appropriate computations depending on your own use case.
What’s next?#
Tip
You can find this content (and more) by registering for the Dataiku Academy course, Visual Recipes. When ready, challenge yourself to earn a certification!