Hands-On Tutorial: Charts and Pivot Tables¶
Charts and pivot tables are key tools for visualizing data. Let’s explore how to use these native tools in Dataiku!
In this tutorial, you will:
create charts, including editing display labels and formatting axes,
create a pivot table with multiple levels of hierarchy,
filter your chart, and
publish your chart on a dashboard.
To learn how to manage a dashboard, see the tutorial on dashboard management.
We’ll reuse the Flow from the AI Consumer quick start.
From the Dataiku homepage, click +New Project > DSS Tutorials > Quick Start > AI Consumer Quick Start (Tutorial).
This project is a simplified credit card fraud use case. Using data about transactions, merchants, and cardholders, we have a model that predicts which transactions should be authorized and which are potentially fraudulent.
For the target variable, authorized_flag, a score of:
1 represents an authorized transaction.
0 is a transaction that failed authorization.
Let’s imagine that we are interested in the relationship between FICO score (a measure of creditworthiness in the US) and whether a purchase is authorized or not. To explore this relationship, we’ll investigate using the transactions_known dataset, which is the training data for the model in this Flow.
This video walks through all the steps described in this section. Play and pause the video as you read the instructions and build your chart.
First, let’s create a vertical bar chart to explore the relationship between FICO score and purchase authorization.
Open the transactions_known dataset, and navigate to the Charts tab.
Drag purchase_amount to the Y axis field.
Drag card_fico_score to the X axis field.
Drag authorized_flag to the Color field.
The average purchase amount for an unauthorized purchase already seems much higher across all FICO scores, but let’s start refining the plot.
Click the authorized_flag dropdown to change the Binning mode to None, use raw values.
Before investigating average purchases, let’s start with counts first.
Click the purchase_amount dropdown to change the Aggregation function to Count non-null.
For card_fico_score, open the dropdown, and check the Generate one tick per bin box.
We now have a basic chart plotting counts of authorized and unauthorized purchases by FICO range. Of course, though, this is just a sample.
Since the dataset is relatively small, let’s view all of it instead of a sample.
In the Sampling & Engine tab on the left, uncheck Use same sample as explore.
Change the Sampling method to No sampling (whole data).
Save and refresh the sample.
We now have the full data in the chart.
Let’s make the chart more presentable.
Click the Y axis dropdown to the left of the purchase_amount measure, and provide the cleaner axis label
In the same dropdown, change the Multiplier to Thousands.
Click the X axis dropdown, and provide a more presentable axis label:
Do you remember what the values 0 and 1 represent? Let’s change the colors to make it easier to recall.
Click the color droplet to the left of the authorized_flag dimension to adjust the palette.
Click on the colored circle to the left of the number, and change the 0 to red (for transactions that failed authorization).
Change the 1 to green (those that passed).
Not bad! As we’d expect, we have relatively few unauthorized transactions compared to those authorized, and this holds true for every level of FICO range.
Dataiku allows you to filter your charts on the data from one or several columns from the source dataset.
Let’s say you want to explore the relationship between FICO score and whether a purchase is authorized or not, only for specific item categories.
So, it means that you need to add a filter on the item_category column.
To do so, follow the steps below:
In the left panel, under Filters, drag and drop the item_category column. By default, all four categories (A, B, C and D) are listed and selected.
Unselect categories B, C and D.
All filters have an option menu in their headers that provides options based on the filter type:
To edit, disable or remove a filter:
Click the More options menu in the filter header.
Try out any options.
The table below describes the different options:
Disable filter: Disables the filter so that it does not impact the chart.
Remove filter: Removes the filter from the Filters section.
Only relevant values / All values in sample: These options are relevant when you have several filters in the Filters section. So, when you have 2 or more filters defined:
each filter with the All values in sample option enabled will display all values,
each filter with the Only relevant values option enabled will only display the values that are relevant based on other filters selection. By default, Dataiku enables the Only relevant values option.
Include other values: When you change the sampling method or update the dataset, if you select this option, the filter adds the new values and checks them to enable them.
Exclude other values: When you change the sampling method or update the dataset, if you select this option, the filter adds the new values but uncheck them to disable them.
Counts aside, is there a difference in the average purchase amount between authorized and unauthorized transactions? Moreover, how does this differ not only by FICO score, but also by item category? Let’s explore this question with a pivot table.
Still in the Charts tab of the transactions_known dataset, click the + Chart button at the bottom to create a new chart.
From the chart type dropdown, choose Pivot table.
Drag purchase_amount to the Value field.
Drag authorized_flag to the Columns field.
Drag card_fico_score to the Rows field.
We have a pivot table! Let’s start refining it.
This video walks through all the steps described in this section. Play and pause the video as you read the instructions and build your table.
Let’s first define clearer labels.
Click on the card_fico_score dropdown to change the number of bins to
10and set the Display label to
Change the Binning mode of authorized_flag to None, use raw values and set the Display label to
$prefix to the purchase_amount measure and set the Display label to
If we were looking at another aggregation of purchase amounts, such as a sum, then the Multiplier dropdown might be useful, as we could format the output to be more readable.
We now have a view of the average purchase between authorized and unauthorized transactions per FICO range.
Let’s explore variation by item_category. We can do this by adding extra levels of hierarchy to the pivot table (in either horizontal or vertical directions).
Drag item_category as the first item in the Columns field so it’s the left-most level of hierarchy in the table.
Click the item_category dropdown to change the sorting to Natural ordering and set the Display label to
For any FICO range, we can now observe the average purchase amount per item category, as well as expand columns within each category to see the breakdown between authorized and unauthorized transactions.
In the table, click the < and > next to category A to collapse and expand the table for that category, hiding and revealing a side-by-side comparison of authorized and unauthorized average purchases only for items in category A.
Experiment with these settings using the Expand/Collapse menu on the right side of the table. When finished, click Expand all.
Depending on your objectives, you may prefer to change the hierarchy of the table. Change the order of variables in the Columns field to view the impact. You can also add even more layers of hierarchy. Add a third variable like signature_provided to the Columns field to test this out! Remove it when done to match the rest of the instructions.
It’s easy to get lost in this table. Let’s add a color scale to help spot any outliers.
Drag purchase_amount to the Color field.
Click the color droplet, and change the color palette to Viridis to observe starker differences.
Let’s also remove the item_category subtotals for a clearer side-by-side comparison of authorized and unauthorized transactions.
On the right side of the table, click Display totals.
Uncheck the box for Totals > Columns so that only grand totals remain.
After expanding all item categories, unauthorized purchases tend to have a greater average purchase amount than those authorized (see the yellow and green cells compared to the purple). This seems particularly apparent among cardholders with lower FICO scores. Let’s drill down further.
Click on the first cell in the table to pin the tooltip.
Click the drill down icon next to FICO range.
Adjust the card_fico_score filter to include scores ranging from 300 to 500. Remember you can click on the value and rewrite it, instead of manually dragging the range endpoints.
We now have much narrower bin ranges for FICO scores. Let’s take a closer look.
Click the Expand chart icon at the top right of the table.
If you haven’t already done so, use the Expand/Collapse tool at the right of the table to expand all columns.
This is a pivot table we can share with stakeholders!
Reduce the table size, and click Download to generate an .xlsx file of the final table.
Why are some chart fields in a blue box and others in a green one? In charts, measures are blue, while dimensions are green. Compare this with the previous chart to confirm!
Beyond creating an export file, you can also share charts on dashboards for an interactive experience within Dataiku. To do so, you must publish your chart. In this tutorial, we’ll publish the pivot table we’ve just created.
The video at the end of this section walks through all the steps described here. Play and pause the video as you read the instructions and edit the dashboard.
From the pivot table in the Charts tab of the transactions_known dataset, click Publish.
Change the dashboard to Purchase Patterns, and click Create to generate the pivot table insight (the box on the left should be checked), and add it to the dashboard in one step.
On the Edit tab of the dashboard, increase the size of the pivot table insight (the video below is one example layout), and click Save.
Navigate to the View tab of the dashboard to see it displayed.
When you publish a chart, the object generated on the dashboard is called an insight. You can view all the created insights in the Insights menu of the top navigation bar (or by pressing the
I keys of your keyboard).
Recall that the chart insight we just created is entirely independent of the original chart. Modifications to the original chart object do not impact the chart insight; modifications to the chart insight do not affect the original chart object.
Congratulations! You’ve created not just exploratory charts and pivot tables, but polished versions that are ready to be presented to stakeholders.
The next step might be managing the dashboards.