Solution | Optimizing Omnichannel Marketing#

Overview#

Business Case#

Pharmaceutical companies depend on strategic marketing campaigns to increase the reach and knowledge of their products and ultimately boost sales. The challenge for companies is to understand the relationship between marketing spend and sales impact for individual healthcare providers (HCP) and build an omnichannel marketing strategy that targets prospects/clients with the right content at the right time.

Adopting an analytics-enabled omnichannel commercial model shows a significant global market impact of 5-10% in healthcare provider satisfaction, a 3-5% increase in prescribers, and a 10-20% increase in marketing efficiencies and cost savings. Building and managing an omnichannel strategy has become more complex due to the increasing communication channel options pushed by digital innovation.

This Dataiku solution supplies an initial framework for customers to adopt and test the value of an omnichannel approach on their own data, while learning how to identify essential brand and sales adoption drivers to design more competent and efficient marketing campaigns.

The journey of digital marketing begins with omnichannel marketing and sales analysis, which provides the data foundation. Brand adoption strategies create a consistent brand presence across channels. Furthermore, customer journey mapping, channel attribution, customer segmentation, and channel affinity help shape the omnichannel strategy by understanding customer behavior and preferences.

Uplift models and Next Best Action recommendations fine-tune the execution, ensuring the right message reaches the right audience at the right time. Ultimately, the goal is to create a seamless and personalized customer experience that fosters brand loyalty, drives conversions, and delivers positive health outcomes in the pharmaceutical industry’s complex and regulated environment.

Installation#

The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.

Dataiku Cloud users should follow the instructions for installing solutions on cloud.

  1. The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.

  2. Once the Solution has been added to your space, move ahead to Data Requirements.

Technical Requirements#

To leverage this solution, you must meet the following requirements:

  • Have access to a Dataiku 13.2+* instance.

  • All code scripts use Python 3.6.

  • To benefit natively from all the Dataiku automation, you are suggested to reconfigure one of the following connections:

    • PostgreSQL

    • Snowflake

Data Requirements#

The solution requires the following input datasets. Please read carefully as several features need to be prepared in the specified schema and name format.

Dataset

Description

Transactions_input

Should contain weekly product quantity sales over time (year preferably) for individual HCP accounts in the following format:

  • account_id (string): ID of the account provider/holder (HCP)

  • product_id (string): ID of the product (brand)

  • date (datetime): timestamp when the transaction was placed prepared in date format MM/DD/YY

  • product_quantity (integer): number of products

Product_input

Is a lookup between product_id, the market brand_name for a drug and the unit_price. The dataset should contain the following:

  • brand_name (string): market name

  • product_id (string): ID of the brand

  • unit_price (double): market price for an individual unit

Providers_input

Is unique at the specific healthcare provider level (variable account_id) of a given hospital or clinic (variable parent_account_id). These records provide insight into the specific practitioners to whom outreach is directed, and some basic information about the hospital where they work. The dataset should contain the following columns:

  • parent_account_id (string): ID of the parent account

  • parent_account_type (string): type of the parent account (hospital, clinic, private practice etc)

  • account_id (string): ID of the account provider/holder (HCP)

  • account_specialty (string): main HCP specialty

  • email_preferences (string): categorical feature (opt-in or opt-out) can also be replaced with whether you have the email or contact information of the account or not

  • account_tenure (double): duration that the account has been active. The user decides how to generate this value (first communication or first purchase) and the metric to be interpreted on the insights (days, months, years).

    Note

    Users can add as many provider characteristics available in their own data (categorical or numerical). These parameters will be used to analyze the different personas and their relation to brand adoption (preferences).

    Examples: promotions, in-office and training events, search engine features, and social media ads.

Dataiku screenshot of the final project Flow showing all Flow Zones.

Omnichannel_input

Has all the marketing outreach with an HCP for a given date over a period of time (that matched the transactions period). These data usually contain web log analytics, email click-through rates, and other in-person or digital interactions. Required variables are account_id, product_id, campaign_id, date as described above. Further instructions below:

  • Users should prepare the date format ```MM/DD/YY``` that aligns with transaction data.

  • You should include at least a few characteristics; otherwise, none of the analyses in this project can be executed.

  • ALL characteristics should be of binary 0/1 format if the communication occurred in a given week. If you have categorical features, you should preprocess them using dummy encoding or other binarization methods. This is necessary as later on we aggregate the data both by week and by HCP account.

  • Every column name should end with one suffix _attempt, _success, _avg_time_min in this exact format (lowercase letters and underscore in front).

    • _attempt shows the marketing effort (i.e., how many emails or calls invites send over a week).

    • _success indicates the user response (i.e., how many emails they open or how many web calls they participated in or website traffic activities).

    • _avg_time_min is an estimation of time interaction when relevant in minutes.

Dataiku screenshot of the final project Flow showing all Flow Zones.

to_score_input

Consists of the test set for the brand adoption modeling session. This dataset should contain ALL the features you select to activate on the Project Setup for the brand adoption training dataset and model. If features are missing, you will get an error from a check scenario running in the background.

Workflow Overview#

You can follow along with the solution in the Dataiku gallery .

Dataiku screenshot of the final project Flow showing all Flow Zones.

The project has the following high-level steps:

  1. Connect your data as input, and select your analysis parameters via the Project Setup.

  2. Harmonize your marketing channel data with provider characteristics and sales transactions in an adaptable Flow.

  3. Apply descriptive analytics to show marketing outreach and sales relations to evaluate campaign effectiveness.

  4. Train classification models, and score new HCP accounts likely to adopt a brand based on channel engagement.

  5. Direct marketing investments and outreach via user interactive visualizations that display graphs and ML explainability tools.

Walkthrough#

Note

In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Plug and Play your HCP and Channel Data#

Kickstart your work by customizing the project parameters with user selection options through a visual interface. This Project Setup can be found in the project overview page.

You can walk through the steps to add your data, and select the analysis parameters to run. Users of any skill level can input their desired parameters for analysis and view the results directly in interactive dashboards.

The setup has two options for data input: data upload or data connection. The Run scenario button checks the format and schema of input data. If this step fails, your data do not comply with the required data model. In this case, check the scenario logs to see which columns are missing or have the wrong type. Once the data are correctly connected, we can Run the scenario next to Data Preparation and Filter Selection to blend and preprocess all the data sources to create a dataset for descriptive visual analytics and the primary input dataset for machine learning modeling.

Dataiku Application Data Upload. Dataiku Application Data Connection.

Relate Marketing Outreach to Future Sales Deviation#

To understand the relationship between marketing outreach and sales, we need to look at the change in sales after a given behavior occurs. This means the effect of a marketing campaign in a week can only be seen on sales in the following X weeks. Select the parameters below to filter and prepare the data for predicting sales deviation (increase, decrease, or constant) if the change in sales is within a user-defined threshold. Select the numerical and categorical features for the machine learning multi-classification analytics session. To explore the generated datasets and processes, switch to the Project View below or go straight to the visual dashboard.

Dataiku Application Sales Deviation. Dataiku Flow Data Sales Deviation.

The Lead Feature Generation Zone filters the data to the user-selected brand. The Window recipe groups sales by account_id and product_id for each week and calculates the difference in sales in X weeks. The Prepare recipe generates the target column sales_deviation by comparing the current and future sales within the user’s threshold.

If the difference is within the boundaries of sales_deviation_upper_filter and sales_deviation_lower_filter, the record is labeled as constant; otherwise, increase or decrease accordingly. The Sales Deviation Modeling Zone joins the provider’s characteristics to the aggregated sales and channel dataset and trains a multi-classification model.

Classify and Target HCP Accounts Likely to Adopt a New Brand#

Following the Sales and Marketing Analysis, in this section, we use the transaction data to generate a binary brand_adoption feature on whether an HCP provider has “adopted” (purchased/prescribed) a product before or not and form a classification problem.

Select the multiple or a single brand and a period filter for the input training dataset. Moreover, numerical and categorical input feature selection is flexible from the input omnichannel and provider characteristics datasets.

We train a machine learning algorithm for those features and enable users to score new data with the same features (check that the data to score includes all the selected features). We further extract feature importance and individual feature values for each account ID to explain which factors impact (positively or negatively) the probability of adoption through visual displays in the Brand Adoption Dashboard.

Dataiku Application Brand Adoption. Dataiku Flow Data Brand Adoption.

Recipes in Total Sales and Marketing by HCP (Account ID) Flow Zone filter and Group the channel data by individual HCP (account_id). The Brand Adoption Modeling Flow Zone computes the target column brand_adopted on whether the aggregated feature product_quantity_weekly from overall transactions is positive or zero. The machine learning session trains a classification model using XGboost algorithms and scores new user input data.

Sharing Dynamic Sales and Marketing Insights#

Sales Overview and Marketing Activity by Brand#

The Marketing Activity by Brand tab shows the different activities for a user-selected brand (product) through the available channels. Marketing teams and product providers can track user engagement through multiple channels and highlight the relationship with sales. Note that the channel attempt indicates the marketing team’s efforts through various communication channels, and success indicates the user’s response to the relevant campaign efforts.

The final tab, Sales Deviation Analysis, shows how we formulate this multi-classification [problem](article:13)) and the results from the machine learning Lab, including explainability tools such as Variable Importance that shows which factors are driving the predicted sales changes and what-if analysis that allows both technical users and business experts to test different combinations of inputs and review if specific changes would yield different outcomes.

Marketing Activity by Brand Sales Deviation Analysis

Brand Adoption by HCP#

The dashboard displays the training and results of the classification problem on whether an individual healthcare provider is expected to adopt (prescribe) a brand given its characteristics and past marketing activities.

The first tab shows the training data and modeling results through the confusion matrix. What-if analysis beyond the functionality of evaluating the effect of different combinations of variables in the predicted target (binary brand_adoption) also shows the exact features preselected by the user for the training session.

The Score New Data tab shows the results for individual HCP from the input test dataset. The user has the option to select from the standard webapp an Account ID (save and run the scenario) and refresh the visual displays. The first graph shows the feature importance for this prediction derived from the SHAPley values. The last graph shows the omnichannel history of the selected account. Combined with the SHAP importance, it can be used to quantify the marketing effect on this account and to build a profile with positive and negative influence metrics.

Brand Adoption Shap values Brand Adoption History

Responsible AI Considerations#

This project makes use of marketing data and personal information related to HCPs. While the sample datasets do not contain personal information (such as age, gender, or race) related to an HCP to drive analysis, real-world data may include these features and should be treated with certain considerations. These considerations should be incorporated across three areas: data, model, and reporting.

  1. Data Check or Model Robustness: If the underlying data about HCPs includes sensitive attributes such as age, race, or gender, we recommend that users measure any potential biases in how marketing and engagement are conducted. For instance, statistics tests, such as a chi-square test or tests for normality, can help confirm whether any meaningful skew exists in the data, such as people over a certain age preferring different types of media (i.e., print vs media). If such a skew or bias exists in the data, it should be noted and handled with preprocessing or in-processing techniques. Additionally, if users wish to avoid using the sensitive features in the downstream analysis, they should check for potential proxies using correlation tests and proxy models.

  2. Reporting: Regarding reporting on the models developed in this project, the dashboard already provides several insights and feature explanations for individual predictions. These tools are important to build so that end users can make decisions with the full context in mind and understand how a given prediction is generated.

Reproducing these processes with minimal effort for other brands and products#

This project intends to enable marketing teams to understand how Dataiku can be used to assess their Omnichannel marketing strategies’ past and future success either by starting a new project from scratch or adapting this existing project to one’s specific needs. A deeper technical walkthrough of the project can be found within the wiki to aid in reproducing this project. Roll-out and customization services can be offered on demand.