Solution | Customer Lifetime Value Forecasting#
Overview#
Business Case#
The consumer landscape continues to evolve in an increasingly competitive marketplace, placing customer loyalty under threat. As a result, it is imperative brands attract and retain high-value customers through impactful initiatives that maximize their value.
Customer Lifetime Value (CLV) is fundamental for companies to understand, track, and work to increase customer value over time. Investing in the long-term retention of your most valuable customers can yield incredible results: a 5% increase in customer retention rates can increase profits by between 25% and 95%.
Forecasted CLV boils down to keeping your customers coming back again and again to repurchase over their lifetime with your company. When CLV is combined with various transactional data, demographic insight, and other key metrics like customer acquisition cost, you get an even clearer view of your most important customers, further empowering you with greater insight for action.
Dataiku’s Customer Lifetime Value solution is designed to address very practical and necessary applications for business users, enabling you to develop an understanding of your customer base, build customer groups, forecast customer lifetime value, and integrate all of the above in your sales and marketing strategies.
Installation#
The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.
Dataiku Cloud users should follow the instructions for installing solutions on cloud.
The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.
Once the Solution has been added to your space, move ahead to Data Requirements.
After meeting the technical requirements below, self-managed users can install the Solution in one of two ways:
On your Dataiku instance connected to the internet, click + New Project > Dataiku Solutions > Search for Customer Lifetime Value Forecasting.
Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.
Technical Requirements#
To leverage this solution, you must meet the following requirements:
Have access to a Dataiku 13.2+* instance.
The Dataiku permission “May develop plugin” is needed to modify the Project Setup post-installation.
A Python 3.9 code environment named
solution_clv_forecast
with the following required packages:
MarkupSafe<2.2.0
Jinja2>=2.11,<3.1
cloudpickle>=1.3,<1.6
flask>=1.0,<2.3
itsdangerous<2.1.0
lightgbm>=3.2,<3.3
scikit-learn==1.3.2
scikit-optimize>=0.7,<=0.10.1
scipy>=1.5,<1.11
statsmodels>=0.12.2,<0.14
Werkzeug<3.1
xgboost>=1.5.1,<2
tdigest>=0.5,<0.6
econml>=0.13,<0.16
pymc-marketing==0.4.2
plotly==5.23.0
Data Requirements#
The Dataiku Flow was initially built using publicly available data. However, this project is meant to be used with your own data which can be uploaded using the Project Setup. Below are the input datasets that the solution has been built with:
Dataset |
Description |
---|---|
transactions_history (mandatory) |
Stores transactions with one line per transaction/customer/product |
customer_metadata (optional) |
Contains unique line for each customer_id and columns for each metadata value. The metadata are split into numerical metadata and categorical metadata in the Project Setup to use them in the classification and regression models. |
customer_rfm_segments (optional) |
Segmentation of customers at the month level provided by the Dataiku RFM Segmentation solution. |
Workflow Overview#
You can follow along with the solution in the Dataiku gallery.
The project has the following high-level steps:
Process the historical transactions data.
Enrich monthly transaction data with RFM segments.
Compute current and future CLV.
Train models to predict future CLV.
Assess model performance.
Visualize and interpret our analyses.
Walkthrough#
Note
In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.
Identifying Customer Value in Past Transactions#
In order to assess and predict Customer Lifetime Value, we need, at minimum our transactions history. If we choose to optionally include customer RFM segments and metadata, it enables enrichment of the model and results. All datasets we decide to include can be connected via the first section of the Project Setup. The datasets we choose to include will be made available in the Data input Flow zone.
Once our data has been connected, the Project Setup prompts us to choose a time frame to apply to our historical data in order to time scope how much history will be considered by the model for training. Within the Monthly Data preparation Flow zone, we aggregate the data at a monthly level so the values input to second section of the App should be at a month unit level so that there is a common scale for customers within the transactions history.
The Window Features Computation Flow zone uses Window recipes to split the data between the lookback window, current window, and forward window of time. If we include RFM segment data, then the Data Enrichment Flow zone is run to merge monthly data with RFM data per customer. By enriching our data with RFM segments, it enables us to identify customers with similar lifetime value and apply specific marketing campaigns per group or study group behavior. Final steps are taken within data preparation to split the data between train/test/validation for the models.
Star Customers - Predicting Customer Lifetime Value#
Before we train models, we can interact with the Project Setup in order to specify the optimization strategy to use for training, clustering options, model parameters, train/test configuration, and computation of additional analytics. The values chosen for these parameters will impact model performance, and thus the Project Setup provides an easy way to adjust, play with, and validate the performance of our CLV forecasting models based on the parameters we select.
Through the Project Setup and provided pre-built dashboards, retail data scientists can more easily test out modeling strategies until an optimal result is achieved. Furthermore, as data will impact model performance over time, this solution can be used to tune parameters over time to respond to changes in real data.
Three models are trained within this solution to support Customer Lifetime Value Forecasting. The first model is a Lifetimes statistical model, which uses the customer’s age, recency, frequency, and average monetary value per transaction to build an estimation of the customer’s number of transactions with a given value. The predictions from this model are an important input for the classification and regression models we will train next. Additionally, the outputs of this model serve as a reference for performance evaluation later on.
Once our Lifetimes model is trained, we train two different VisualML models to predict the future CLV. The best model will depend on your specific needs: the classification model is used to predict the future CLV cluster of a customer, whereas the regression model predicts the future CLV of a customer. In both cases, the predicted CLV groups are compared, but depending on our data and optimization strategy, the model results will differ. On the data we used to build this solution, we found that the classification algorithm gave better performance.
Despite having already trained three models, we’re not done yet in predicting Customer Lifetime Value! As a final step in the modeling we apply all three models, as well as CLV group clustering, to our inference data (the last full month of available data). Doing this enables us to get the full scope for the future CLV groups with active customers.
Assessing Model Performance#
An important step in any predictive analytics project is the assessment of our model performance. Several recipes are applied to the outputs of our model in order to transform the data into visualization-ready datasets via the Dashboard Management Flow zone.
The results of this Flow zone are visualized in two Dashboards. The first, Data Science Validation Dashboard should be used by data scientists in order to compare the performances between all models and cross-check for unexpected behavior. Additionally, this dashboard provides the ability to compare actual CLV with the prediction to identify any effects of time or low transaction customers on the model.
Finally, we can compare predicted distributions between models and evaluate the errors in the predictions of the CLV groups using charts on this pre-built dashboard. In addition to this dashboard, it’s recommend to take advantage of Dataiku’s built in VisualML capabilities, such as subpopulation analysis, partial dependence plots, and more.
Explore our Predictions and Customer Behavior#
Once we’re satisfied with the accuracy and performance of our models, the Business Insights Dashboard can be used to explore the final outputs of our solution, make business decisions based on what we see, and share these visualizations with the rest of the organization for smarter decision making.
Several visualizations are provided in this dashboard in order to summarize our transactions history and customer base, contextualize the chosen time scope, and provide an average monthly value per customer in each CLV group.
Furthermore, we can use this dashboard to compare the distribution of customers and values across groups and explore the most common transitions between CLV groups (i.e., current to predicted group). This can be used to evaluate possible factors impacting changes in Customer Lifetime Value, which can be used to design more impactful marketing and customer outreach campaigns.
A short note on automation#
It is possible to automate the Flow of this solution to be triggered based on new data, a specific time, etc via the Project Setup. All of these trigger parameters can be tuned in the Scenarios menu of the project.
Additionally, reporters can be created to send messages to Teams, Slack, email, etc. to keep our full organization informed. These scenarios can also be run ad-hoc as needed. Full detail on the scenarios and project automation can be found in the wiki.
Reproducing these Processes With Minimal Effort For Your Own Data#
The intent of this project is to enable customer success teams to understand how Dataiku can be used to assess the value of their existing customer base and make smarter decisions on customer outreach, marketing campaigns, and much more.
By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single retail organization, smarter and more holistic strategies can be designed in order to maximize sales, while keeping customer outreach and acquisition costs down.
We’ve provided several suggestions on how CLV can be calculated, predicted, classified, and used, but ultimately the “best” approach will depend on your specific needs and your data. If you’re interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.