Market Basket Analysis¶

Overview¶

Business Case¶

Personalization is a huge opportunity for Retail and CPG businesses: 80% of companies report seeing an uplift since implementing personalization, which includes recommending relevant products to users. Several techniques can be used to build relevant recommendations: one of them is the Market Basket Analysis, used by retailers to increase sales by better understanding customer purchasing patterns. It relies on the analysis of large purchase history dataset to identify products that are likely to be purchased together.

One of the most famous examples of it is the well-known e-commerce giant which heavily uses “Frequently bought together” items on the product pages. It can also be leveraged by bricks-and-mortar stores: for example, a sports shop could choose to place running shoes next to swimsuits based on the analysis to increase sales. Overall, it is a great and powerful way to generate value through several use cases: optimizing product placement both online and offline, offering product bundles deals etc. While driving additional sales for the retailer and enhancing the shopping experience for customers, Market Basket Analysis is a key asset to make the customers build brand loyalty toward the company.

The solution consists of a data pipeline that computes association rules, identifies product recommendations for customers, and in doing so opens up a wide range of product and purchasing analyses. Analysts can input their own data and surface the outputs in a dashboard or interactive WebApp in order to analyze their organization’s own transaction data. Data Scientists should use this solution as an initial building block to develop advanced analytics / support decision making. Roll-out and customization services can be offered on demand.

Technical Requirements¶

Warning

It is strongly recommended to read the project wiki before using this solution as it further explains many of the parameters, requirements, and deliverables throughout the solution.

To leverage this solution, you must meet the following requirements:

Have access to a DSS 9.0+ instance
To benefit natively from the solution, a PostgreSQL or Snowflake connection storing your transaction data (see Data Requirements) is needed
A Python 3.6 code environment named solution_market-basket-analysis with the following required packages:

mlxtend==0.18.0
dateparser==1.0.0
regex==2022.3.2
Flask==2.0.1

Note

Dataiku Online instances will auto-install these requirements when the Solution is created.

Installation¶

This solution is available to install on Dataiku and Dataiku Online instances.

Installing on your Dataiku Instance¶

If the technical requirements are met, this solution can be installed in one of two ways:

On your Dataiku instance click + New Project > Industry solutions > Retail > Market Basket Analysis.
Download the .zip project file and upload it directly to your Dataiku instance as a new project.

Note

If running a DSS 9 instance, the solution is found by navigating to + New Project > Sample projects > Solutions > Market Basket Analysis

Installing on a Dataiku Online Instance¶

Dataiku Online customers can add this Solution to their managed instance from the Launchpad: Features > Add A Feature > Extensions > Market Basket Analysis

Data Requirements¶

The Dataiku flow was initially built using publicly available data. However, this project is meant to be used with your own data which can be uploaded using the Dataiku Application. Having a transactional historical dataset is mandatory to run the project and each row of the dataset should be comprised of:

The Description column describes the item and is later used as the item identifier.
The InvoiceNo column serves as our transaction identifier.
The InvoiceDate column will contextualize the purchase of an item, within a transaction, on a given date
Optionally, the transactions_dataset can also have rows containing
- CustomerID for incorporating customer data
- Country to contextualize transactions on where they occurred.

Workflow Overview¶

You can follow along with the sample project in the Dataiku gallery.

Dataiku screenshot of the final project Flow showing all Flow Zones.

The project has the following high level steps:

Connect your data as an input and select your analysis parameters via the Dataiku Application.
Ingest and pre-process the data to be compatible with the association rules computation.
Compute the association rules and filter the most relevant rules for better consumption downstream.
Identify products to be recommended to customers based on their past transactions.
Interactively visualize the most frequently bought items, and the products associated with them for smarter product recommendations.

Walkthrough¶

Note

In addition to reading this document, it is recommended to read the wiki of the project before beginning in order to get a deeper technical understanding of how this solution was created, the different types of data enrichment available, longer explanations of solution specific vocabulary, and suggested future direction for the solution.

Plug and play with your own data and parameter choices¶

To begin, you will need to create a new instance of the Market Basket Analysis Dataiku Application. This can be done by selecting the Dataiku Application from your instance home, and click Create App Instance.

Screenshot showing how to create a new App instance.

Once the new instance has been created you can walk through the steps of the Application to add your data and select the analysis parameters to be run.

Dataiku screenshot of part of the Dataiku Application for Market Basket Analysis

Once we’ve built all elements of our Dataiku Application you can either continue to the Project View to explore the generated datasets or go straight to the Dashboards and WebApp to visualize the data. If you’re mainly interested in the visual components of this pre-packaged solution, feel free to skip over the next section.

Under the Hood: What happens in the Dataiku Application’s underlying Flow?¶

The Dataiku Application is built on top of a Dataiku Flow that has been optimized to accept input datasets and respond to your select parameters. Let’s quickly walk through the different flow zones to get an idea of how this was done.

Further explore your association rules with shareable visualizations¶

The Market Basket Analysis solution comes with a prebuilt dashboard containing:

3 slides with visualizations built with Dataiku charts to make consumption of the project’s datasets easier to consume
2 slides with interactive Webapps to allow us to explore the association rules and product recommendations derived from our transactions data

Note

Both the charts and Webapps are impacted by parameters selected in the Dataiku Application so final renderings in your own projects may differ as a result.

The dashboard charts give a variety of visual ways to understand our transactions dataset both before and after applying preprocessing. These visualizations alone can give us an overview of the transactions impacting our Market Basket Analysis, identify purchasing patterns, understand the origin of certain association rules, and better tune the parameters of the Dataiku Application to more easily find association rules.

Dataiku screenshot showing some of the charts available in the pre-build dashboard

The final two slides of the dashboard contain two Webapps: Items frequency analysis and Rules browser. If you are unable to interact with the Webapp within the dashboard, you might need to start/restart the Webapp backend. This can be done by going to the Webapp menu or by running the restart_webapp_backend scenario.

Let’s first take a look at the Items frequency analysis Webapp which allows us to analyze the support of our most frequent items. The images contained in this article use the “Country” column of our transactions dataset as an association rule scope. Within this Webapp there are some additional helpers that can be expanded to better understand the wording used throughout.

Dataiku screenshot of the items frequency analysis Webapp using sample filters

If a rules scope was configured in the Dataiku Application, you will need to choose at least one rules scope (e.g. one Country). We can additionally choose to focus on items that are common to a specified number of rules and/or select specific items to focus on from the full list of frequent items identified in our transactions dataset. A counter at the top of the WebApp shows how many items out of the total item count match the filters we’ve set.

Let’s end by taking a look at the Rules browser Webapp found in the final slide of the dashboard. This Webapp allows us to choose our most frequent items in order to browse the computed association rules linked to them. Once again there are expandable helpers to clarify the language used in this Webapp. We can interact with the Webapp by first searching for selecting one or more items (e.g. Blue Pen, Party Balloons). Selecting an item will allow us to visualize their associated rules on the right split into Triggers (the antecedent of the rule, aka. the item we selected) and Outcomes (the consequent of the rules, aka items they are associated with). Hovering over an underlined value will provide a quick description of the metric.

Dataiku screenshot of the rules-browser Webapp

We can browse the computed association rules more easily using the other interactive elements of the Webapp by:

Swapping between the two tabs of Triggers or Outcomes
Filtering rules based on a rule metric threshold
Ordering results (ascending or descending) by one of the rules metrics

Additionally, we can export the results of our interactive analysis as a CSV for further sharing and/or analysis.

A short note on automation¶

As mentioned in the Dataiku Application section of this article, it is possible to automate the flow of this solution to be triggered based on new data, a specific time etc. All of these trigger parameters can be tuned in the Scenarios menu of the project. Additionally, reporters can be created to send messages to Teams, Slack, email, etc. to keep our full organization informed. These scenarios can also be run ad-hoc as needed. Full detail on the scenarios and project automation can be found in the wiki.