Solution | Lead Scoring#

Overview#

Business Case#

Lead scoring is an essential tool for optimizing marketing and sales efforts. Machine learning models are enhancing or replacing traditional models based on business rules and hard-coded segmentation.

These models leverage a wider array of data points and create far more granular and dynamic predictions of conversion rates and estimated revenues. This improves return on marketing spend and enables more proactive and precise targeting at increasingly early engagement stages.

Lead scoring enhancements are two-fold. First, they allow teams to better rank leads by quality, which can have significant impact. Second, they allow leads to be assessed more effectively for potential revenue generation after conversion, ensuring effort is expended on leads that are not simply likely to convert, but are also likely to generate solid revenue once onboarded.

Dataiku’s Lead Scoring solution offers sales and marketing teams an opportunity to efficiently adopt machine learning lead scoring techniques, and simultaneously unlock the power of internal and external data sources for greater lead insight.

The Dataiku platform ensures all of this valuable work can be seamlessly integrated into any existing marketing, CRM or related sales systems, creating a robust and complete marketing pipeline.

Installation#

The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.

Dataiku Cloud users should follow the instructions for installing solutions on cloud.

  1. The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.

  2. Once the Solution has been added to your space, move ahead to Data Requirements.

Technical Requirements#

To leverage this solution, you must meet the following requirements:

  • Have access to a Dataiku 13.2+* instance.

Data Requirements#

The project is initially shipped with all datasets using the filesystem connection.

The input data should be separated into four different datasets:

Dataset

Description

lead_touchpoints

Includes the different touch points that exist for each lead in the Lead Information Datasets.

historical_lead_information

Includes static information about historical leads and whether they converted to customers or not.

to_score_lead_information

Includes static information about leads to be scored.

customers_value

Includes a list of the actual customers with their respective value and static information. Static information must be included in the Lead Information Datasets to be used as features in the model.

Note

To build the customer value prediction model, at least one optional column must be added to the historical_lead_information, to_score_lead_information, and the customers_value datasets. Additionally, to be used as features in this model, static information from the customer_value dataset must also exist in the to_score_lead_information dataset.

Workflow Overview#

You can follow along with the solution in the Dataiku gallery.

Dataiku screenshot of the final project Flow showing all Flow Zones.

The solution has the following high-level steps:

  1. Connect your data as input, build the Flow, and access the dashboard via the Project Setup.

  2. Explore the leads and historical customer data in the dashboard’s first and third pages.

  3. Analyze the lead conversion predictions in the second page.

  4. Analyze the outcomes of the customer value prediction model in page 4.

  5. In the last page, analyze the final value assigned to each lead.

Walkthrough#

Note

In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Plug and play with your own data and parameter choices#

To begin, you will need to configure the Project Setup, which you can access from the project home page.

The project is delivered with sample data that should be replaced with your data — assuming that it adopts the data model described above.

This can be done in two ways:

  1. Data can be uploaded directly from your filesystem in the first section of the Dataiku app.

  2. Data can be connected to your database of choice by selecting an existing connection.

In option 1 and 2, users must click the Check button, which will load the data and verify the schema.

Important

Be sure to refresh the page so that the app can dynamically take your data into account.

Dataiku screenshot of the accompanying Project Setup for this solution.

With our data selected and loaded into the Flow, we can move to the following app sections:

  • Build Flow and score leads: This section allows you to enter the customer’s value currency for better visualization, and click on run to build the Flow and generate results.

Dataiku screenshot of the accompanying Project Setup for this solution.

Cleaning and Preparing our Historical Data#

In total, three Flow zones are involved in data preparation and cleaning for this solution. We won’t go into heavy detail about each Flow zone as this information can be found in the wiki of the project.

These Flow zones help construct the consolidated input datasets and clean the results for better interpretability.

Exploring Input Data#

To better understand the input data and verify that it is coherent, it’s important first to explore the historical datasets. Doing so allows us to identify the population distributions and trends of the historical customers and leads.

The Historical data analysis Flow zone compute all metrics and values needed to generate charts for the first and third pages of the Lead Scoring Dashboard.

Dataiku screenshot of the EDA pages of the Dashboard.

Predict Conversion Rate#

The Estimated Conversion Rate page presents the results of the classification model. This model is trained on information about historical leads, along with their final output (whether they converted or not), to assign a likelihood of conversion to new leads.

From these results, we are able to group new leads by decile, ranking them by the likelihood of conversion, and compute their corresponding expected conversion rate.

The page provides a detailed view of the conversion model score and the expected conversion rate computed on the lead to score. Additionally, visual analyses from the classification model are presented on the page’s bottom.

Dataiku screenshot of the Estimated Conversion Rate page of the Dashboard.

Predict Customer Value#

The Estimated Customer Value page presents the results of the regression model. In this model, historical customer information is taken as input to train the model and predict the estimated customer value of the new leads.

It showcases the outcomes of the customer value prediction model. The predicted values are presented in the Customer Value Table, and further visual analyses derived from the regression model are featured at the bottom of the page.

Dataiku screenshot of the Estimated Customer Value page of the Dashboard.

Analyze Final Lead Value#

The Lead Value page displays the final value assigned to each lead, calculated by multiplying the probability of conversion with the expected customer value associated with each lead.

Lead Value = Estimated Conversion Rate * Estimated Customer Value

Dataiku screenshot of the Lead Value page of the Dashboard.

Responsible AI Considerations#

The Lead Scoring solution is designed to serve as a powerful tool. It allows companies to tailor their marketing and sales strategies to the specific needs and preferences of each prospect, resulting in a more personalized and positive customer experience. Nevertheless, misusing this solution may lead to unintended consequences, such as the unfair treatment of certain prospects or customers.

Tools to review the fairness of the models, the statistical parity, or the independence of the distributed errors are also available within Dataiku.

Additionally, companies must adhere to and respect the compliance regulations established by their respective organizations.

Reproducing these Processes With Minimal Effort For Your Own Data#

The intent of this project is to enable business users to understand how Dataiku can be used to score leads by predicting their expected value and expected conversion rate.

We have provided several suggestions on how to use historical lead information and customer information to build lead analysis and assign a score value. However, the best approach will ultimately depend on your specific needs and the data of interest. If you are interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.