Solution | Pharmacovigilance#
Overview#
Business Case#
The efficiency of post-market drug safety surveillance functions plays a critical role in reinforcing patient safety and securing successful drug launches. Compliance in safety reporting and surveillance is a must-meet regulatory requirement (Good Pharmacovigilance Practice), and failure to appropriately report, detect, and address adverse drug reactions can lead to patient harm, drug recall, and significant costs.
As the volume/velocity/variety of safety reporting data grows, it is becoming essential for global safety teams at drug manufacturers, health outcomes research institutions, and regulatory bodies alike to adopt new analytics-driven approaches, that can be automated at scale, to improve early signal detection and reliability in the pharmacovigilance process.
This plug-and-play solution aims at providing a ready-to-use interface to accelerate the discovery of potential Adverse Drug Reaction (ADR) signals by using statistical metrics to generate disproportionality metrics on drugs and adverse events paired across various populations.
Installation#
The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.
Dataiku Cloud users should follow the instructions for installing solutions on cloud.
The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.
Once the Solution has been added to your space, move ahead to Data Requirements.
After meeting the technical requirements below, self-managed users can install the Solution in one of two ways:
On your Dataiku instance connected to the internet, click + New Project > Dataiku Solutions > Search for Pharmacovigilance.
Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.
Technical Requirements#
To leverage this solution, you must meet the following requirements:
Have access to a Dataiku 12.0+* instance with a built-in Python3 environment (or create a Python3 code env).
All code scripts use Python 3.6. Depending on the specifications of the user’s instance, these packages may be tracked with future python updates.
To benefit natively from all the Dataiku application automation, you will need to reconfigure one of the following connections:
PostgreSQL
Snowflake
Data Requirements#
The inputs of this solution are contained in two managed folders which are helpful for storing data structures that are unsupported by Dataiku standard Flow objects.
Managed folder |
Description |
---|---|
Product Drug Names |
Contains the Product.txt file which is imported by default from Orange Book . It contains a list of drugs and pharmaceuticals that the U.S. Food and Drug Administration (FDA) has approved as both safe and effective. |
Input Files |
Requires (at least) 5 datasets as .txt files to be imported from FDA Adverse Event Reporting System (FAERS) . These five datasets contain adverse vent reports, medication error reports, and product quality complaints resulting in adverse events that were submitted to the FDA. |
Workflow Overview#
You can follow along with the solution in the Dataiku gallery .
The project has the following high-level steps:
Ingest data files.
Process the data, detect duplicate reports, and filter on demographic, drug, reaction, and report characteristics.
Identify and visualize patterns in safety data.
Calculate metrics for statistical inference and signal detection.
Analyze new insights with a Dataiku Application.
Increase regulatory compliance with early detection of potential ADR signals.
Walkthrough#
Note
In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.
Plug and Play FAERS quarterly data files#
The aforementioned input files can be uploaded to the solution either directly into the managed folders or via the Dataiku Application interface. Following upload, the connections of the Flow can be reconfigured, and users can select to anonymize the manufacturer and drug names for confidentiality reasons. Within the Input Files Flow zone, two Python recipes are applied to our initial managed folders:
Python recipe |
Description |
---|---|
compute_FDA_products |
Parses FDA standard drug name txt file to a data frame and export a dataset object. |
Faers_data_ingestion |
Parses and accesses ASCII (.txt) files and convert them to data frames. The process includes file name checks to a prespecified regex condition and further mapping of column codes to standard terms. |
In this Flow zone, visual Distinct recipes are also used to keep the values necessary for running the rest of the Flow and generating statistical analysis.
The Data Preparation (Drug) Flow zone extracts information from drug interactions and joins it with the indication dataset using visual split, group, and join recipes.
Once our initial data is imported and cleaned, we are ready to begin aggregating our data.
Setting the Data for Visual Insights#
The Data Standardization (Drug) Flow zone takes, as an input, the previously prepared dataset of drug interactions joined with indications and a dataset of FDA product names. These two datasets are joined together, and the FDA drug name is used to standardize any misspellings in the FAERS data.
Moving along to the Data Preparation (Demographics) dataset, we take the FAERS’ reaction, outcome, and demographics datasets and the joined drug name dataset from our previous Flow zone as inputs. A visual prepare recipe is used on our demographic data to clean age, country, and date features. We then join this to our other 3 datasets and compute a feature to represent the seriousness of an event based on outcome codes. Additional recipes are used to calculate metrics on the number of adverse events and to anonymize the manufacturer and drug name (if selected in the Dataiku App).
In this long and tedious process of mining through the data, duplicates are a common mistake as the database contains information voluntarily submitted by healthcare professionals, consumers, lawyers, and manufacturers. Hence, adverse event reports may be duplicated by multiple parties per event and may be more likely to contain incorrect information if that is submitted by a non-medical professional. Report Deduplication updates column names and removes any record duplication.
The Data Analytics/Statistics Flow zone filters data using user-specific variables and splits the entire dataset into cohort subpopulations. Final statistics to be used in the dashboard are generated via a python script that applies prefiltering on adverse event frequency, computes the measure of disproportionality statistics for each drug and adverse event pair, and outputs individual dataset objects to be used for further comparison and signal detection.
The three output datasets from our previous Flow zone are brought into the Visualizations Flow zone, along with the output from Report Deduplication. This Flow zone processes the output datasets to generate warnings on potential drug adverse event signals and visual insights. Final datasets generate a number of graphs published in the Pharmacovigilance analytics dashboard.
Explore General Trends from Data Analysis#
The project’s dashboard provides insights into general trends in the data. There are both Descriptive Analytics and Statistical Analysis insights available in the Pharmacovigilance analytics dashboard to support decision-making. The first two tabs are dedicated to Descriptive Analytics:
Tab |
Description |
---|---|
Metrics |
Shows the number of records after data preprocessing, top values for drugs, adverse events, and manufacturers. After importing the solution, all metrics in this tab will show errors. This is the intended behavior and can be resolved by clicking the Run Now button and refreshing the dashboard once the scenario is complete. |
Safety Report Analytics |
Delivers visualizations of safety report trends by gender, age, seriousness, outcome, and reporters. Filters can be used to focus on specific column values from the visualized datasets. |
The Statistical analysis is composed of three different tabs:
Tab |
Description |
---|---|
Disproportionality Analysis |
Analyzes potential warnings between drugs and adverse events through various disproportionality metrics (eg. Proportional Reporting Ration, Reporting Odd Ratio, etc.). Dashboard filters can be used to modify this analysis. |
DPA by Gender |
Enables us to individually analyze subpopulations in our data to enable comparative study. Top events and drugs measured for each gender reveal potential warnings for each group. Other metrics such as seriousness and age provide more levels of granularity for detecting populations that are likely at risk from different medications. |
Safety Report Signals |
Show potential warnings between all the preselected drug x adverse event pairs. Warnings are determined based on the threshold set on the 95% lower confidence interval of Reporting Odd Ratio and Proportional Reporting Ratio. |
Keep Trends Updated#
This project has a Dataiku Application that enables any user to create a new instance that re-runs the entire Flow based on the user’s filter selection and updates the pre-build dashboards. The application also allows users to export some output datasets for use in other potential Data Science workflows.
Setting up the application is done in three steps:
Input files with the option to reconfigure the connections.
Adjust filters and population selection for the Data Analytics/Statistics section of the Flow to better understand populations and events.
Specify the frequency of events and individual drugs/manufacturers to be examined, adjust signal generation thresholds, and calculate measures of disproportionality.
Reproduce these Processes with Minimal Effort#
The intent of this project is to enable Drug Safety and Surveillance stakeholders to understand how Dataiku can be used to easily integrate large amounts of data from spontaneous reporting systems, and just as easily push the resulting datasets into case management systems for investigation. By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single organization, or across multiple organizations, immediate insights can be used to detect drug risks early, prevent patient harm, ensure safety in diverse populations, detect dangerous drug interactions, and anticipate the lengthy regulatory process of drug recalls with early action.
We’ve provided several suggestions on how to integrate data from spontaneous reporting systems and extract actionable insights, but ultimately the “best” approach will depend on your specific needs. If you’re interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.