Solution | Clinical Trial Intelligence Assistant#
Overview#
Business case#
For a clinical operations manager, the search for optimal trial sites is a time-consuming effort across multiple disconnected sources. While competitors secure the best investigators, clinical operations managers often spend weeks manually piecing together data from ClinicalTrials.gov, CTMS records, and payment databases. This Solution replaces this manual process with an agentic system that allows domain experts to query data directly using natural language.
Instead of having IT and analytics teams manually collect, wrangle, and precompute data for every new request, an agentic system allows business users to directly interact with their data using natural language and receive actionable insights on demand. This isn’t just about efficiency; it’s about empowering your domain experts to make faster, more informed decisions that can accelerate the delivery of therapies to patients.
This Solution proposes an agentic framework that dynamically brings together information from multiple sources to support clinical operations (ClinOps) teams in making informed decisions—specifically around clinical site selection and principal investigator (PI) identification. This agentic system is hosted by Agent Hub, a collaborative workspace where employees can find, use, and create approved AI agents. At the same time, IT retains complete control over access to models, data, tools, and the agent lifecycle.
Key beneficiaries include:
Clinical operations teams
Data managers
IT teams
Installation#
From the Design homepage of a Dataiku instance connected to the internet, click + Dataiku Solutions.
Search for and select Clinical Trial Intelligence Assistant.
If needed, change the folder into which the Solution will be installed, and click Install.
Follow the modal to either install the technical prerequisites below or request an admin to do it for you.
From the Design homepage of a Dataiku instance connected to the internet, click + New Project.
Select Dataiku Solutions.
Search for and select Clinical Trial Intelligence Assistant.
Follow the modal to either install the technical prerequisites below or request an admin to do it for you.
Note
Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.
Technical requirements#
To leverage this Solution, you must meet the following requirements:
Have access to a Dataiku 14.2+* instance.
An internal code environment: retrieval augmented generation. Please refer to Dataiku’s documentation.
The SQL Question Answering Agent Tool plugin v1.1.3+.
The Agent Hub plugin v1.0.4+.
The agentic system requires an SQL connection for its SQL query tool, a stronger Large Language Model (LLM) for planning and reasoning, a simpler LLM for basic tasks, and an embedding model for vectorization tasks.
Data requirements#
The Solution includes one prepackaged dataset and can optionally import several datasets from two existing Dataiku Solutions.
The prepackaged dataset US_ZIP_COUNTY maps ZIP codes to their corresponding U.S. states, counties, and cities.
The following datasets can be imported from the two Dataiku Solutions (optional):
Clinical Site Intelligence Solution:
studies_features_w_sdoh_scored_prepared
studies_similarity_features_prepared
overall_officials_concat
studies_w_sites_joined
interventions_distinct
Social Determinants of Health Solution:
new_measure_final_dataset_county
Workflow overview#
The project has the following high-level steps:
Ingest datasets from selected Dataiku Solutions (Optional)
Build a study knowledge bank
Launch the Agent Hub webapp
Walkthrough#
Note
In addition to reading this document, it’s recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.
Building Solution via project setup#
You must use the project setup to build the Solution and give users access to the Agent Hub webapp.
Connection configuration#
Select the connections for the project datasets and the models driving the agents. The agentic system requires an SQL connection for its SQL query tool, a stronger model for planning and reasoning, a simpler model for basic tasks, and an embedding model for vectorization tasks.
Tip
The following connections were tested on this Solution.
SQL connections: PostgreSQL, Amazon Redshift, Snowflake, Google BigQuery
Completion models: gpt-4o, 4.1, 5 (OpenAI), claude 3.5, 3.7, 4 (Anthropic), gemini 2.5-flash, pro (Google)
Embedding models: embedding v3 (OpenAI)
However ultimately the best connections and models will depend on your specific needs and data. It’s recommended to do your own iterative testing on what produces the best results.
Importing Solution insights (optional)#
Select the project or the Dataiku Application instance for the Clinical Site Intelligence Solution and Social Determinants of Health Solution, then select the required dataset for each input.
Tip
The Solution is prepackaged with 20k cancer studies and the US socioeconomic metrics. You can opt out and run the agent with the prepackaged data as a trial. However, we highly encourage our users to carefully review this agentic framework, including its architecture and prompts, revise the framework as needed, and thoroughly test its output before implementation.
Additional configuration#
Email connection#
The agent requires an SMTP channel. Users can create the channel in Administration > Notifications & Integrations > login with Username-Password > Messaging channels > Mail (SMTP) channel.
To generate a password to sign in within the Mail (SMTP) channel, use this link.
Send message tool configuration#
Select the corresponding mail channel for the agent tool Sent Draft by Email
Launch Agent Hub#
Once the pipeline is built, click the “Clinical Trial Intelligence Assistant” button to access the webapp and interact with the agent.
Agent Hub webapp: Clinical Trial Intelligence Assistant#
The Agent Hub Webapp hosts the Clinical Trial Intelligence Assistant. During the project setup, a foundation model is assigned as a simple orchestrator to power the Agent Hub. Users can assign other agents as enterprise agents or create new agents in the Agent Hub Webapp. Please refer to the documentation to learn more about Agent Hub settings.
Conversations#
Start a new conversation by including the required enterprise agents.
Validating agent outputs#
At the end of each agent’s response, click on “See details” to trace the agent’s activities. It reveals the agent’s reasoning process and the raw data it retrieved for a given task.
Visualize the agent activity log using the Trace Explorer.
Responsible AI statement#
This Solution leverages analytics and ML-driven insights to inform clinical site recruitment according to study protocol design. It’s critical, however, to remain aware of historical inequalities in clinical research recruitment. Patient enrollment has often under-represented communities defined by sex, gender, minority/ethnic background, or specific health conditions. Consequently, any data-driven approach may inherit biases present in clinical trial registries. Careful consideration of these limitations is essential when interpreting results.
The Solution also enhances clinical site intelligence with U.S. Social Determinants of Health (SDOH) data to promote recruitment diversity. This data is derived from community-level surveys and shouldn’t be used to make inferences about individuals’ socioeconomic status, minority/ethnic background, or household circumstances in predicting disease occurrence or outcomes. Self-reported survey data is particularly susceptible to recall, social desirability, and non-response biases. Any decisions or actions informed by this analysis must account for these potential biases and limitations.
While leveraging associations between regional, community-level characteristics and disease, this information should be used to advance health equity and improve therapeutic access—actively avoiding reinforcement or exacerbation of disparities or biases in health and life sciences systems. This approach can be extended to incorporate additional data, including Health Care Professional (HCP) or pharmacy geolocation information, as well as de-identified individual-level patient behavioral and clinical data in areas identified as potentially underserved.
Furthermore, any models developed to guide personalized patient-care journeys, health outreach programs, pricing strategies, or therapeutic delivery must undergo a thorough evaluation based on a robust Responsible AI ethics framework. This process ensures mitigation of biases, inclusion of all relevant subpopulations, and establishment of model interpretability and explainability.
See also
We encourage users to check out Dataiku’s Responsible AI course to learn more.
Reproduce these processes with minimal effort#
This project intends to enable healthcare and life science professionals to understand how Dataiku can accelerate a data-driven approach to facilitate clinical operations by leveraging diverse datasets.
By creating a singular Solution that can benefit and influence the decisions of various teams in a single organization or across multiple organizations, you can use immediate insights to refine clinical site recruitment strategies for drug manufacturers.
This documentation has provided several suggestions on how to derive value from this Solution. Ultimately however, the “best” approach will depend on your specific needs and data. If you’re interested in adapting this project to the specific goals and needs of your organization, Dataiku offers roll-out and customization services on demand.
