Solution | Pharmacovigilance#

Overview#

Business case#

The efficiency of post-market drug safety surveillance functions plays a critical role in reinforcing patient safety and ensuring successful drug launches. Compliance with safety reporting and surveillance requirements (Good Pharmacovigilance Practice) is a mandatory regulatory obligation. Failure to appropriately report, detect, and address adverse drug reactions can result in patient harm, drug recalls, and significant costs.

As the volume/velocity/variety of safety reporting data grows, it’s becoming essential for global safety teams at drug manufacturers, health outcomes research institutions, and regulatory bodies alike to adopt new analytics-driven approaches that can be automated at scale to improve early signal detection and reliability in the pharmacovigilance process.

This plug-and-play Solution aims to provide a ready-to-use interface to accelerate the discovery of potential adverse drug reaction (ADR) signals. It uses statistical metrics to generate disproportionality metrics on drugs and adverse events paired across various populations.

Installation#

  1. From the Design homepage of a Dataiku instance connected to the internet, click + Dataiku Solutions.

  2. Search for and select Pharmacovigilance.

  3. If needed, change the folder into which the Solution will be installed, and click Install.

  4. Follow the modal to either install the technical prerequisites below or request an admin to do it for you.

Note

Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.

Technical requirements#

To leverage this Solution, you must meet the following requirements:

  • Have access to a Dataiku 13.3+ instance with a built-in Python3 environment (or create a Python3 code env).

  • To benefit natively from all the Dataiku application automation, you will need to reconfigure one of the following connections:

    • PostgreSQL

    • Snowflake

Data requirements#

Two managed folders contain the inputs of this Solution. Such folders are helpful for storing data structures unsupported by Dataiku standard Flow objects.

Managed folder

Description

Product Drug Names

Contains the Product.txt file which is imported by default from Orange Book. It contains a list of drugs and pharmaceuticals that the U.S. Food and Drug Administration (FDA) has approved as both safe and effective.

Input Files

Requires (at least) 5 datasets as .txt files to be imported from FDA Adverse Event Reporting System (FAERS). These five datasets contain adverse vent reports, medication error reports, and product quality complaints resulting in adverse events that were submitted to the FDA.

Workflow overview#

You can follow along with the Solution in the Dataiku gallery.

Dataiku screenshot of the final project Flow showing all Flow Zones.

The project has the following high-level steps:

  1. Ingest data files.

  2. Process the data, detect duplicate reports, and filter on demographic, drug, reaction, and report characteristics.

  3. Identify and visualize patterns in safety data.

  4. Calculate metrics for statistical inference and signal detection.

  5. Analyze new insights with a Dataiku application.

  6. Increase regulatory compliance with early detection of potential ADR signals.

Walkthrough#

Note

In addition to reading this document, it’s recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Plug and play FAERS quarterly data files#

You can upload the aforementioned input files to the Solution either directly into the managed folders or via the Dataiku application interface. Following upload, you can reconfigure the connections of the Flow and select to anonymize the manufacturer and drug names for confidentiality reasons. The Input Files Flow zone applies two Python recipes to the initial managed folders:

Python recipe

Description

compute_FDA_products

Parses FDA standard drug name .txt file to a data frame and export a dataset object.

Faers_data_ingestion

Parses and accesses ASCII (.txt) files and converts them to data frames. The process includes file name checks to a pre-specified regex condition and further mapping of column codes to standard terms.

Screenshot of the input and data prep Flow zones.

In this Flow zone, visual Distinct recipes are also used to keep the values necessary for running the remainder of the Flow and generating statistical analyses.

The Data Preparation (Drug) Flow zone extracts information from drug interactions and joins it with the indication dataset using visual Split, Group, and Join recipes.

Once you have imported and cleaned the initial data, you’re ready to begin aggregating the data.

Setting the data for visual insights#

The Data Standardization (Drug) Flow zone takes as input the previously prepared dataset of drug interactions joined with indications and a dataset of FDA product names. The Flow joins these two datasets together, and uses the FDA drug name to standardize any misspellings in the FAERS data.

Screenshot of the additional data prep Flow zones to make analysis possible

Moving along to the Data Preparation (Demographics) dataset, the Flow takes the FAERS’ reaction, outcome, and demographics datasets and the joined drug name dataset from the previous Flow zone as inputs. A visual Prepare recipe applied to the demographic data cleans age, country, and date features.

The Flow then joins this to the other three datasets and computes a feature to represent the seriousness of an event based on outcome codes. Additional recipes calculate metrics on the number of adverse events and anonymizes the manufacturer and drug name (if selected in the Dataiku App).

The database contains information voluntarily submitted by healthcare professionals, consumers, lawyers, and manufacturers. Accordingly, duplicates are a common mistake. Hence, multiple parties per event may duplicate adverse event reports. These reports may be more likely to contain incorrect information if submitted by a non-medical professional. Report Deduplication updates column names and removes any record duplication.

Screenshot of the Flow zones dedicated to Statistical Analysis and pre-processing for visualizations.

The Data Analytics/Statistics Flow zone filters data using user-specific variables and splits the entire dataset into cohort subpopulations. A Python script generates final statistics used in the dashboard. This script:

  • Applies pre-filtering on adverse event frequency.

  • Computes the measure of disproportionality statistics for each drug and adverse event pair

  • Outputs individual dataset objects used for further comparison and signal detection.

The Visualizations Flow zone brings the three output datasets from the previous Flow zone, along with the output from Report Deduplication. This Flow zone processes the output datasets to generate warnings about potential drug adverse event signals and visual insights. Final datasets generate a number of graphs published in the Pharmacovigilance analytics dashboard.

Reproduce these processes with minimal effort#

The intent of this project is to enable drug safety and surveillance stakeholders to understand how they can use Dataiku to integrate large amounts of data from spontaneous reporting systems. Then, they can push the resulting datasets into case management systems for investigation.

By creating a singular Solution that can benefit and influence the decisions of a variety of teams in a single organization, or across multiple organizations, you can:

  • Use immediate insights to detect drug risks early.

  • Prevent patient harm.

  • Ensure safety in diverse populations

  • Detect dangerous drug interactions.

  • Anticipate the lengthy regulatory process of drug recalls with early action.

This documentation has reviewed provided several suggestions on how to derive value from this Solution. Ultimately however, the “best” approach will depend on your specific needs and data. If you’re interested in adapting this project to the specific goals and needs of your organization, Dataiku offers roll-out and customization services on demand.