Solution | Lead Scoring#


Business Case#

Lead scoring is an essential tool for optimizing marketing and sales efforts. Traditional models based on business rules and hard-coded segmentation are being enhanced or replaced by machine learning models.

These models leverage a wider array of data points and create far more granular and dynamic predictions of conversion rates and estimated revenues. This improves return on marketing spend and enables ever more proactive and precise targeting at increasingly early engagement stages.

Lead scoring enhancements are two-fold. First, they allow teams to better rank leads by quality, which can have significant impact. Second, they allow leads to be assessed more effectively for potential revenue generation after conversion, ensuring effort is expended on leads that are not simply likely to convert, but are also likely to generate solid revenue once onboarded.

Dataiku’s Lead Scoring Solution offers sales and marketing teams an opportunity to efficiently adopt machine learning lead scoring techniques, and simultaneously unlock the power of internal and external data sources for greater lead insight.

The Dataiku platform ensures all of this valuable work can be seamlessly integrated into any existing marketing, CRM or related sales systems, creating a robust and complete marketing pipeline.


The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.

Dataiku Cloud users should follow the instructions for installing solutions on cloud.

  1. The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.

  2. Once the Solution has been added to your space, move ahead to Data Requirements.

After meeting the technical requirements below, self-managed users can install the Solution in one of two ways:

  1. On your Dataiku instance connected to the internet, click + New Project > Dataiku Solutions > Search for Lead Scoring.

  2. Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.

Additional note for 12.1+ users

If using a Dataiku 12.1+ instance, and you are missing the technical requirements for this Solution, the popup below will appear to allow admin users to easily install the requirements, or for non-admin users to request installation of code environments and/or plugins on their instance for the Solution.

Admins can processes these requests in the admin request center, after which non-admin users can re-trigger the successful installation of the Solution.

Screenshot of the Request Install of Requirements menu available for Solutions.

Technical Requirements#

To leverage this solution, you must meet the following requirements:

  • Have access to a Dataiku 12.5+* instance.

Data Requirements#

The project is initially shipped with all datasets using the filesystem connection.

The input data should be separated into four different datasets:




Includes the different touch points that exist for each lead in the Lead Information Datasets.


Includes static information about historical leads and whether they converted to customers or not.


Includes static information about leads to be scored.


Includes a list of the actual customers with their respective value and static information. Static information must be included in the Lead Information Datasets to be used as features in the model.


To build the customer value prediction model, at least one optional column must be added to the historical_lead_information, to_score_lead_information, and the customers_value datasets. Additionally, to be used as features in this model, static information from the customer_value dataset must also exist in the to_score_lead_information dataset.

Workflow Overview#

You can follow along with the Solution in the Dataiku gallery.

Dataiku screenshot of the final project Flow showing all Flow Zones.

The Solution has the following high-level steps:

  1. Connect your data as input, build the Flow, and access the dashboard via the Dataiku Application.

  2. Explore the leads and historical customer data in the dashboard’s first and third slides.

  3. Analyze the lead conversion predictions in the second slide.

  4. Analyze the outcomes of the customer value prediction model in slide 4.

  5. In the last slide, analyze the final value assigned to each lead.



In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Plug and play with your own data and parameter choices#

To begin, you will need to create a new instance of the Lead Scoring application. This can be done by selecting the Dataiku Application from your instance home, and clicking Create App Instance.

The project is delivered with sample data that should be replaced with your data, assuming that it adopts the data model described above.

This can be done in two ways:

  1. Data can be uploaded directly from your filesystem in the first section of the Dataiku app.

  2. Data can be connected to your database of choice by selecting an existing connection.

In option 1 and 2, users must click the Check button which will load the data and verify the schema.


Be sure to refresh the page so that the app can dynamically take your data into account.

Dataiku screenshot of the accompanying Dataiku Application for this solution.

With our data selected and loaded into the Flow, we can move to the following app sections:

  • Build Flow and score leads: This section allows you to enter the customer’s value currency for better visualization, and click on run to build the Flow and generate results.

Dataiku screenshot of the accompanying Dataiku Application for this solution.

Cleaning and Preparing our Historical Data#

In total, three Flow zones are involved in data preparation and cleaning for this Solution. We won’t go into heavy detail about each Flow zone as this information can be found in the wiki of the project.

These Flow zones help construct the consolidated input datasets and clean the results for better interpretability.

Exploring Input Data#

To better understand the input data and verify that it is coherent, it’s important first to explore the historical datasets. Doing so allows us to identify the population distributions and trends of the historical customers and leads.

The Historical data analysis Flow zone compute all metrics and values needed to generate charts for the first and third slides of the Lead Scoring Dashboard.

Dataiku screenshot of the EDA slides of the Dashboard.

Predict Conversion Rate#

The Estimated Conversion Rate page presents the results of the classification model. This model is trained on information about historical leads, along with their final output (whether they converted or not), to assign a likelihood of conversion to new leads.

From these results, we are able to group new leads by decile, ranking them by the likelihood of conversion, and compute their corresponding expected conversion rate.

The slide provides a detailed view of the conversion model score and the expected conversion rate computed on the lead to score. Additionally, visual analyses from the classification model are presented on the slide’s bottom.

Dataiku screenshot of the Estimated Conversion Rate page of the Dashboard.

Predict Customer Value#

The Estimated Customer Value page presents the results of the regression model. In this model, historical customer information is taken as input to train the model and predict the estimated customer value of the new leads.

It showcases the outcomes of the customer value prediction model. The predicted values are presented in the Customer Value Table, and further visual analyses derived from the regression model are featured at the bottom of the slide.

Dataiku screenshot of the Estimated Customer Value page of the Dashboard.

Analyze Final Lead Value#

The Lead Value page displays the final value assigned to each lead, calculated by multiplying the probability of conversion with the expected customer value associated with each lead.

Lead Value = Estimated Conversion Rate * Estimated Customer Value

Dataiku screenshot of the Lead Value page of the Dashboard.

Responsible AI Considerations#

The Lead Scoring Solution is designed to serve as a powerful tool. It allows companies to tailor their marketing and sales strategies to the specific needs and preferences of each prospect, resulting in a more personalized and positive customer experience. Nevertheless, misusing this Solution may lead to unintended consequences, such as the unfair treatment of certain prospects or customers.

Tools to review the fairness of the models, the statistical parity, or the independence of the distributed errors are also available within Dataiku.

Additionally, companies must adhere to and respect the compliance regulations established by their respective organizations.

Reproducing these Processes With Minimal Effort For Your Own Data#

The intent of this project is to enable business users to understand how Dataiku can be used to score leads by predicting their expected value and expected conversion rate.

We have provided several suggestions on how to use historical lead information and customer information to build lead analysis and assign a score value. However, the best approach will ultimately depend on your specific needs and the data of interest. If you are interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.