Tutorial | Group the data (Core Designer part 6)¶
After the tutorial introducing statistical worksheets, we have prepared a dataset and have done some preliminary statistical exploration.
If our ultimate goal is to understand our customers, we’ll need to group all past orders by unique customers, aggregating their past interactions. To do this, we’ll use another visual recipe, Group.
In this tutorial, you will:
Use the Group recipe to aggregate data.
A screencast at the end of the page recaps all of the actions described here.
Let’s get started!
From the Flow, select the orders_prepared dataset.
In the Actions tab of the right panel (+ button), choose Group in the list of Visual recipes. The Group recipe allows you to aggregate the values of some columns by the values of one or more keys.
In the recipe dialog, choose to group by customer_id.
Change the name of the output dataset to
Select Create Recipe.
The core step of the Group recipe is the Group step, where you choose which columns to serve as keys and which aggregations you want performed.
On the Group step, in the Per field aggregations section, select the following aggregations:
Min of order_date
Avg of pages_visited
Sum of total
For each unique customer ID, the output will have the date of first order, the average number of visited pages per visit, and the sum of all orders. We’ll also compute the count for each group — a default setting.
The recipe reminds us of the storage type of each column in the Per field aggregations tile. We are able to retrieve the minimum of order_date because its storage type is a date. If it were a string, the “minimum” would be the first result in alphabetical order.
Before running the recipe, navigate to the Output step.
Rename order_date_min to
Click Run to create the new grouped output dataset.
Columns in the input dataset not used in the group key or per field aggregations (like order_id and tshirt_category) are not included in the output dataset.
Let’s quickly observe the output.
Open the orders_by_customer dataset.
Click on the customer_id column dropdown, and select Analyze.
Note that all values are unique. We have exactly one record for every customer after grouping by customer_ID.