Solution | Customer Lifetime Value Forecasting#

Overview#

Business Case#

The consumer landscape continues to evolve in an increasingly competitive marketplace, placing customer loyalty under threat. As a result, it is imperative brands attract and retain high-value customers through impactful initiatives that maximize their value. Customer Lifetime Value (CLV) is fundamental for companies to understand, track, and work to increase customer value over time. Investing in the long-term retention of your most valuable customers can yield incredible results: a 5% increase in customer retention rates can increase profits by between 25% and 95%.

Forecasted CLV boils down to keeping your customers coming back again and again to repurchase over their lifetime with your company. When CLV is combined with various transactional data, demographic insight and other key metrics like customer acquisition cost, you get an even clearer view of your most important customers, further empowering you with greater insight for action.

Dataiku’s Customer Lifetime Value solution is designed to address very practical and necessary applications for business users, enabling you to develop an understanding of your customer base, build customer groups, forecast customer lifetime value, and integrate all of the above in your sales and marketing strategies.

Installation#

The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.

Dataiku Cloud users should follow the instructions for installing solutions on cloud.

  1. The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.

  2. Once the Solution has been added to your space, move ahead to Data Requirements.

After meeting the technical requirements below, self-managed users can install the Solution in one of two ways:

  1. On your Dataiku instance connected to the internet, click + New Project > Dataiku Solutions > Search for Customer Lifetime Value Forecasting.

  2. Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.

Additional note for 12.1+ users

If using a Dataiku 12.1+ instance, and you are missing the technical requirements for this Solution, the popup below will appear to allow admin users to easily install the requirements, or for non-admin users to request installation of code environments and/or plugins on their instance for the Solution.

Admins can processes these requests in the admin request center, after which non-admin users can re-trigger the successful installation of the Solution.

Screenshot of the Request Install of Requirements menu available for Solutions.

Technical Requirements#

To leverage this solution, you must meet the following requirements:

  • Have access to a Dataiku 12.5+* instance.

  • The Dataiku permission “May develop plugin” is needed to modify the Dataiku Application post-installation.

  • A Python 3.8 code environment named solution_clv_forecast with the following required packages:

MarkupSafe<2.1.0
cloudpickle==1.3.0
flask>=1.0,<1.1
jinja2>=2.10,<2.11
lifetimes==0.11.3
lightgbm>=3.2,<3.3
matplotlib==3.3.4
nbformat==5.1.3
plotly==5.13.0
scikit-learn==1.3.2
scikit-optimize==0.10.1
scipy==1.10.1
statsmodels>=0.10,<0.11
xgboost==0.82

Data Requirements#

The Dataiku Flow was initially built using publicly available data. However, this project is meant to be used with your own data which can be uploaded using the Dataiku Application. Below are the input datasets that the solution has been built with:

Dataset

Description

transactions_history

(mandatory)

Stores transactions with one line per transaction/customer/product

customer_metadata

(optional)

Contains unique line for each customer_id and columns for each metadata value. The metadata are split into numerical metadata and categorical metadata in the Dataiku application to use them in the classification and regression models.

customer_rfm_segments

(optional)

Segmentation of customers at the month level provided by the Dataiku RFM Segmentation solution.

Workflow Overview#

You can follow along with the solution in the Dataiku gallery.

Dataiku screenshot of the final project Flow showing all Flow zones.

The project has the following high-level steps:

  1. Process the historical transactions data.

  2. Enrich monthly transaction data with RFM segments.

  3. Compute current and future CLV.

  4. Train models to predict future CLV.

  5. Assess model performance.

  6. Visualize and interpret our analyses.

Walkthrough#

Note

In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Identifying Customer Value in Past Transactions#

In order to assess and predict Customer Lifetime Value, we need, at minimum our transactions history. If we choose to optionally include our customer RFM segments and metadata, it enables enrichment of the model and results. All datasets we decide to include can be connected via the first section of the Dataiku Application. To begin, you will need to create a new instance of the Customer Lifetime Value Forecasting Dataiku Application. This can be done by selecting the Dataiku Application from your instance home, and clicking Create App Instance. Several Application instances can be created if you want to change how CLV is predicted based on different data or different parameters. The datasets we choose to include will be made available in the Data input Flow zone.

Dataiku screenshot of part of the Dataiku Application for CLV Forecasting

Once our data has be connected, we are prompted by the Dataiku Application to choose a time frame to apply to our historical data in order to time scope how much history will be considered by the model for training. Within the Monthly Data prepartion Flow zone we aggregate the data at a monthly level so the values input to second section of the App should be at a month unit level so that there is a common scale for customers within the transactions history. Window recipes are employed by the Window Features Computation Flow zone to split our data between the lookback window, crrent window, and forward window of time. If we include RFM segment data then the Data Enrichment Flow zone is run to merge our monthly data with RFM data per customer. By enriching our data with RFM segments it enables us to identify customers with similar lifetime value and apply specific marketing campaigns per group or study group behavior. Final steps are taken within data preparation to split our data between train/test/validation for the models.

Star Customers - Predicting Customer Lifetime Value#

Before we train the models, we can interact with the Dataiku Application in order to specify the optimization strategy to use for training, clustering options, model parameters, train/test configuration, and computation of additional analytics. The values chosen for these parameters will impact model performance and thus the Dataiku Application provides us with an easy way to adjust, play with, and validate the performance of our CLV forecasting models based on the parameters we select. Through the Dataiku App and provided pre-built dashboards, retail Data Scientists can more easily test out modeling strategies until an optimal result is achieved. Furthermore, as data will impact model performance over time, this solution can be used to tune parameters over time to respond to changes in real data.

Three models are trained within this solution to support Customer Lifetime Value Forecasting. The first model is a Lifetimes statistical model which uses the customer’s age, recency, frequency, and average monetary value per transaction to build an estimation of the customer’s number of transactions with a given value. This predictions from this model are an important input for the classification and regression models we will train next. Additionally, the outputs of this model serve as a reference for performance evaluation later on.

Dataiku screenshot of part of the training of our Lifetimes Statistical Model

Once our Lifetimes model is trained, we train two different VisualML models to predict the future CLV. The best model will depend on your specific needs: the Classification model is used to predict the future CLV cluster of a customer whereas the Regression model predicts the future CLV of a customer. In both cases the predicted CLV groups are compared but depending on our data and optimization strategy, the model results will differ. On the data we used to build this solution, we found that the classification algorithm gave better performance.

Despite having already trained 3 models, we’re not done yet in predicting Customer Lifetime Value! As a final step in the modeling we apply all 3 models, as well as CLV group clustering, to our Inference data (the last full month of available data). Doing this enables us to get the full scope for the future CLV groups with active customers.

Assessing Model Performance#

An important step in any predictive analytics project is the assessment of our model performance. Several recipes are applied to the outputs of our model in order to transform the data into visualization-ready datasets via the Dashboard Management Flow zone. The results of this Flow zone are visualized in two Dashboards. The first, Data Science Validation Dashboard should be used by Data Scientists in order to compare the performances between all models and cross-check for unexpected behavior. Additionally, this dashboard provides the ability to compare actual CLV with the prediction to identify any effects of time or low transaction customers on the model. Finally, we can compare predicted distributions between models and evaluate the errors in the predictions of the CLV groups using charts on this pre-built dashboard. In addition to this Dashboard, it’s recommend to take advantage of Dataiku’s built in VisualML capabilities such as Subpopulation Analysis, Partial Dependence Plots, and more.

Dataiku screenshot of visualizations in the Data Science Validation Dashboard to assess model performance

Explore our Predictions and Customer Behavior#

Once we’re satisfied with the accuracy and performance of our models, the Business Insights Dashboard can be used for us to explore the final outputs of our solution, make business decisions based on what we see, and share out these visualizations with the rest of the organization for smarter decision making. Several visualizations are provided in this dashboard in order to summarize our transactions history and customer base, contextualize the chosen time scope, and provide an average monthly value per customer in each CLV group. Furthermore, we can use this dashboard to compare the distribution of customers and values across groups and explore the most common transitions between CLV groups (i.e., current to predicted group). This can be used to evaluate possible factors impacting changes in Customer Lifetime Value which can be used to designer more impactful marketing and customer outreach campaigns.

Dataiku screenshot of summary data in the Business Insights Dashboard used to explore our CLV predictions

A short note on automation#

It is possible to automate the Flow of this solution to be triggered based on new data, a specific time, etc via the Dataiku Application. All of these trigger parameters can be tuned in the Scenarios menu of the project. Additionally, reporters can be created to send messages to Teams, Slack, email, etc. to keep our full organization informed. These scenarios can also be run ad-hoc as needed. Full detail on the scenarios and project automation can be found in the wiki.

Reproducing these Processes With Minimal Effort For Your Own Data#

The intent of this project is to enable customer success teams to understand how Dataiku can be used to assess the value of their existing customer base and make smarter decisions on customer outreach, marketing campaigns, and much more.

By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single retail organization, smarter and more holistic strategies can be designed in order to maximize sales, while keeping customer outreach and acquisition costs down.

We’ve provided several suggestions on how CLV can be calculated, predicted, classified, and used, but ultimately the “best” approach will depend on your specific needs and your data. If you’re interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.