Solution | Next Best Action for Pharma#

Overview#

Business case#

Pharmaceutical field teams engage with hundreds of HCPs (Healthcare Providers) per territory through multiple channels, such as rep visits, calls, emails, webinars, and digital ads, while managing limited budgets and constrained rep time. Traditional engagement approaches rely on intuition, static segmentation, or broad channel strategies, leading to:

  • Wasted rep time on low-value HCPs

  • Channel saturation (too many emails, not enough high-touch interactions)

  • Budget inefficiency (expensive channels applied to low-probability engagements)

  • Missed opportunities with high-value, responsive HCPs

This Solution transforms HCP engagement from reactive, one-size-fits-all outreach to data-driven, personalized recommendations, combining machine learning, mathematical optimization, and conversational AI to recommend the right channel for the right HCP, while respecting budget, capacity, and compliance constraints.

Installation#

  1. From the Design homepage of a Dataiku instance connected to the internet, click + Dataiku Solutions.

  2. Search for and select Next Best Action for Pharma.

  3. If needed, change the folder into which the Solution will be installed, and click Install.

  4. Follow the modal to either install the technical prerequisites below or request an admin to do it for you.

Note

Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.

Technical requirements#

To leverage this Solution, you must meet the following requirements:

  • Have access to a Dataiku 14.5.1+ * instance.

  • A Python 3.10 code environment named solution_next-best-action with the following required packages:

    • dash

    • flask

    • pulp

    • dash-bootstrap-components

  • The Time series preparation plugin v2.1.2+.

  • The Agent Hub plugin v1.2.4+.

  • The agentic system requires an SQL connection for the semantic model, a stronger Large Language Model (LLM) for planning and reasoning, a simpler LLM for basic tasks, and an embedding model for vectorization tasks.

Data requirements#

The project ships with sample data across four datasets using the filesystem connection. Replace each with your organization’s data before running the pipeline.

Dataset

Description

Engagement_history

Historical record of all HCP-channel interactions, including channel type, timing, and outcome.

KPI_metrics

Quantified KPI values per HCP over time (default: Rx / prescriptions). Can be replaced with any measurable business KPI.

Engagement_action_catalog

Catalog of all available engagement actions (channels), including cost and expected channel uplift.

HCP_master

Master table of HCP attributes. At a minimum, an hcp_id is required. Additional columns are flexible based on data availability.

When connecting your own data, ensure the following:

  • Engagement_history and KPI_metrics must be aggregated at the same time frequency (weekly or monthly) and cover the same period.

  • All required columns must be present in Engagement_history, KPI_metrics, and Engagement_action_catalog.

  • Provide at minimum an hcp_id in HCP_master. All other HCP attributes are optional.

  • Update the NBA_cycle_start variable in the Project Setup to reflect your current planning cycle.

Workflow overview#

Dataiku screenshot of the final project Flow showing all Flow zones.

You can follow along with the Solution in the Dataiku gallery.

The project Flow is organized into multiple Flow zones that move data from raw inputs through feature engineering, ML scoring, and MILP optimization to a set of deployable HCP recommendations. Each zone maps directly to a section in the walkthrough below. Once the project is configured, the Build All scenario executes the full pipeline end to end.

Flow zone

Purpose

Input data & consolidation

Ingests HCP profiles, engagement history, and KPI datasets and aligns them to a common time window.

Feature engineering

Builds rolling engagement metrics, recency signals, prescription trends, and HCP profile attributes into a single HCP-channel modeling dataset.

ML scoring & impact calculation

Trains a classification model to predict engagement probability per channel and computes expected prescription impact and lift for each HCP-channel pair.

Next best action optimization

Applies the MILP optimizer to select the highest-impact subset of actions across the HCP portfolio within budget, capacity, and channel constraints.

Explainability

Combines SHAP values with feature definitions to generate plain-language recommendation summaries for sales reps via a Prompt recipe.

Agent & compliance enrichment

Processes compliance document embeddings and prepares the Knowledge Base for the NBA Assistant.

Visualizations

Aggregates recommendation outputs into the datasets that power the four dashboards.

Walkthrough#

Note

In addition to reading this document, it’s recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Configure connections and set business rules#

To begin, open the Project Setup. This is where you define the data connections, LLM models, and business rules that govern the optimization model.

Connection configuration#

Select the connections for the project datasets and the models driving the agents. The agentic system requires an SQL connection for the semantic model, a stronger model for planning and reasoning, a simpler model for basic tasks, and an embedding model for vectorization tasks.

Dataiku screenshot of the project setup connection configuration.

Tip

The following connections have been validated for this Solution:

  • SQL connections: PostgreSQL, Snowflake

  • Completion models: gpt-4o, OpenAI models, Anthropic Claude models

  • Embedding models: text-embedding-3-small (OpenAI)

Select and test the models that best align with your organization’s data requirements. Iterative testing is recommended.

Budget constraints and rules configuration#

Set your budget constraints, rep capacity limits, channel availability windows, and any HCP-level exclusions before proceeding.

Dataiku screenshot of the budget constraints and business rules configuration.

Shape the features that drive the model#

The Solution includes a dedicated feature engineering Flow zone that transforms the four input datasets into a single modeling dataset where each row represents an HCP-channel combination.

The feature set covers four broad categories:

  • HCP profile attributes: Specialty, practice type, and geography.

  • Channel-specific engagement history: Contact frequency, recency, and historical response rates per channel.

  • Rolling Rx KPI trends: Trends computed over 4- and 12-week windows.

  • Action catalog metadata: Channel cost and channel type.

Rolling window sizes and which HCP attributes are included can be adjusted directly in the Flow to match your cycle cadence and therapeutic area.

Note

After building features in the Flow, populate the feature_definition dataset with a plain-language name and description for each feature. This is used by a Prompt recipe to translate model inputs into human-readable explanation language for sales rep summaries.

Descriptions should be written in business terms, not technical ones. For example:

Feature name

Good description

Poor description

rolling_rx_4w

Number of prescriptions written by this HCP in the last 4 weeks

rolling_rx_4w lag feature

last_rep_visit_days

Number of days since the HCP’s last face-to-face rep visit

days since last visit int

email_open_rate_12w

Share of marketing emails opened by this HCP over the last 12 weeks

email_open_rate feature

From predictions to an optimal action plan#

The Solution chains two analytical layers to move from raw data to a deployable action plan.

First, a classification model is trained to predict the probability of HCP engagement for each channel, using the features assembled in the previous step. SHAP (SHapley Additive exPlanations) values are computed for every scored HCP-channel pair, surfacing the top drivers behind each engagement probability such as specialty, recency of last contact, Rx trend, in human-readable labels that feed directly into the agent’s recommendation justifications.

These scored probabilities and their explanations are then passed to a Mixed Integer Linear Programming (MILP) optimizer, which allocates actions to maximize prescription impact across the full HCP portfolio, subject to the budget, channel capacity, and rep time constraints defined in the Project Setup.

SHAP values are then combined with the feature_definition descriptions and actual feature values in a Prompt recipe that generates plain-language recommendation summaries for each HCP-channel pair, surfacing the top drivers in terms sales reps can act on.

Tip

Review both the Machine Learning Analysis and the Optimization problem in the Flow to confirm they reflect your data and your organization’s commercial priorities before proceeding.

Ground the agent in your compliance and explainability layer#

Warning

Agent warnings This Solution includes an embedded conversational agent linked to your recommendation outputs, compliance Knowledge Base, and CRM tools. The agent prompts are tailored to the provided sample data and use case. You must review and adapt these prompts to your own context before using them in production.

The Solution includes an embedded conversational AI agent, the NBA Assistant, that enables sales reps and marketing operations teams to interact with recommendation outputs in natural language, without leaving the dashboard. Reps can ask which HCPs to prioritize, request a plain-language explanation of why a specific action was recommended, draft a compliant outreach email, or log a CRM interaction; all within a single interface.

The agent draws on two complementary knowledge sources that should be configured before running the Solution for the first time.

RAG Knowledge Base: Upload your organization’s compliance policies and approved product reference documents to the Compliance Folder in the project. The agent will draw on them at inference time to ground any outreach content, product references, and channel justifications it generates.

Dataiku screenshot of the NBA Assistant agent conversation.

Tip

Keeping compliance documents current is critical. Establish a review cadence aligned to your label update cycle.

Run the Build All scenario and review results#

Once all setup steps are validated, run the Build All scenario from the Dataiku scenario panel. After completion, review outputs in:

  • final_dataset: The full NBA recommendation table with channel assignments, expected impact, and cost per HCP.

  • Next Cycle Action Plan dashboard: The primary interactive view for sales reps, managers, and commercial excellence teams.

Dashboards#

NBA recommendations overview#

Target personas: Sales Representative, Marketing Operations

This dashboard provides an operational, weekly planning view for sales representatives and marketing operations teams. It displays cycle-level budget allocations, channel distributions against capacity limits, and a ranked list of the highest-value HCP opportunities for the current cycle. Users can access the embedded NBA Action Assistant directly from the dashboard to run ad-hoc queries, draft compliant outreach, and log CRM interactions without leaving the view.

Dataiku screenshot of the NBA Recommendations Overview dashboard.

Territories overview#

Target personas: Territory Manager, Sales Manager, Commercial Leadership

Territory-level oversight and gap detection for weekly team planning and resource allocation. Combines headline KPIs, a sortable territory performance table, and a channel mix analysis that surfaces territories over-reliant on a single channel. A dedicated gap detection view highlights high-value HCPs who received no recommended action in the current cycle, giving managers a clear starting point for manual review and override.

Dataiku screenshot of the Territories Overview dashboard.

Commercial monitoring#

Target personas: Commercial Excellence, Data Science

System performance monitoring and budget scenario planning for commercial excellence and data science teams. Tracks rolling Rx trends, engagement rate trajectories, and model performance indicators across cycles to flag drift before it reaches recommendations. An embedded what-if scenario tool lets teams simulate alternative budget and capacity configurations and compare results against the live baseline before committing to a change.

Dataiku screenshot of the Commercial Monitoring dashboard.

Sales rep interactive app#

Target personas: Sales Representative, Marketing Operations

A Dash webapp for reps who need more granular control than the native dashboard provides. Supports simultaneous filtering across specialty, territory, channel, engagement probability band, and Rx trend direction, with a fully sortable action table. Clicking any row opens a drill-down panel showing the feature contribution breakdown, historical engagement timeline, and pre-generated outreach draft for that HCP.

Dataiku screenshot of the Sales Rep Interactive App.

See the Sales Rep Interactive App article for full documentation.

Responsible AI considerations#

A responsible AI approach to a Next Best Action solution in the pharmaceutical context requires careful attention to transparency, fairness, compliance, and appropriate use.

  • Explainability: Recommendations should be explainable in clear, business-relevant terms so that field teams understand the rationale and can apply their own judgment rather than follow outputs blindly.

  • Fairness: The system must avoid bias by ensuring that recommendations aren’t driven by inappropriate or sensitive attributes and that different HCP segments are treated equitably.

  • Data quality and compliance: Data used should be accurate, up to date, and compliant with privacy and regulatory standards, with no use of restricted or promotional-inappropriate signals.

  • Decision support framing: NBA outputs should indicate likelihood of engagement rather than guarantee outcomes, complementing rather than replacing human expertise.

  • Ongoing governance: Regular monitoring, validation, and governance processes should be in place to ensure the model continues to perform as intended and aligns with evolving ethical and regulatory expectations.

See also

We encourage users to check out Dataiku’s Responsible AI course to learn more.

Reproduce these processes with minimal effort#

This Solution is designed to enable pharmaceutical commercial teams to understand how Dataiku can accelerate a data-driven approach to HCP engagement by combining machine learning, optimization, and Generative AI.

Ultimately, the best approach will depend on your organization’s specific data, therapeutic area, and commercial strategy. If you are interested in adapting this project to your organization’s specific goals and needs, Dataiku offers on-demand rollout and customization services.

This documentation has provided several suggestions on how to derive value from this Solution. Ultimately however, the “best” approach will depend on your specific needs and data. If you’re interested in adapting this project to the specific goals and needs of your organization, Dataiku offers roll-out and customization services on demand.