Tutorial | Group the data (Core Designer part 6)

After the tutorial introducing statistical worksheets, we have prepared a dataset and have done some preliminary statistical exploration.

If our ultimate goal is to understand our customers, we’ll need to group all past orders by unique customers, aggregating their past interactions. To do this, we’ll use another visual recipe, Group.

Objectives

In this tutorial, you will:

  • Use the Group recipe to aggregate data.

Starting here?

If you skipped the previous sections, you need to complete the tutorial on preparing data so you have the orders_prepared dataset.

Create a Group recipe

Tip

A screencast at the end of the page recaps all of the actions described here.

Let’s get started!

  1. From the Flow, select the orders_prepared dataset.

  2. In the Actions tab of the right panel (+ button), choose Group in the list of Visual recipes. The Group recipe allows you to aggregate the values of some columns by the values of one or more keys.

  3. In the recipe dialog, choose to group by customer_id.

  4. Change the name of the output dataset to orders_by_customer.

  5. Select Create Recipe.

Create the Group recipe.

Select aggregations by group key

The core step of the Group recipe is the Group step, where you choose which columns to serve as keys and which aggregations you want performed.

  1. On the Group step, in the Per field aggregations section, select the following aggregations:

    • Min of order_date

    • Avg of pages_visited

    • Sum of total

    For each unique customer ID, the output will have the date of first order, the average number of visited pages per visit, and the sum of all orders. We’ll also compute the count for each group — a default setting.

    Group step in the Settings tab of a Group recipe.

    Note

    The recipe reminds us of the storage type of each column in the Per field aggregations tile. We are able to retrieve the minimum of order_date because its storage type is a date. If it were a string, the “minimum” would be the first result in alphabetical order.

  2. Before running the recipe, navigate to the Output step.

  3. Rename order_date_min to first_order_date.

  4. Click Run to create the new grouped output dataset.

Output column names in the Group step of the Settings tab of a Group recipe.

Note

Columns in the input dataset not used in the group key or per field aggregations (like order_id and tshirt_category) are not included in the output dataset.

Explore the output dataset

Let’s quickly observe the output.

  1. Open the orders_by_customer dataset.

  2. Click on the customer_id column dropdown, and select Analyze.

Exploring a column of the output dataset using the Analyze tool.

Note

Note that all values are unique. We have exactly one record for every customer after grouping by customer_ID.

See a video covering all the steps of this tutorial

What’s next?

Now that you have a few datasets and recipes in the Flow, it’s time to take stock of what you’ve accomplished in the next tutorial on exploring the Flow.