Solution | Social Determinants of Health#


Business Case#

Social determinants of health (SDoH) are the conditions and environment where people live, learn, work, and play that affect a wide range of health and quality-of life-risks and outcomes. Research has shown that SDoH can account for up to 90% of health outcomes, whereas medical care accounts for only 10%-15%.

Understanding the social factors associated with the prevalence of health measures (including chronic diseases, behaviors, and outcomes) not only aligns with social responsibility programs, but also can deliver ROI with improved patient outcomes by:

  • Identifying resources, therapeutics, and interventions for populations incorporating both social and disease risk vulnerabilities

  • Developing responsible patient-centric risk-adjusted payment or care models to ensure health equity

  • Impacting operational/spending/quality metrics for both precision preventative care and therapeutic access equity

Hospitals, public and private health services systems, health insurers, government agencies, and pharmaceutical and medical device companies are all increasingly tasked to leverage population/community health insights of social vulnerabilities tied to health measure prevalence to inform business practices to address health/disease and therapeutic access disparities.

With this solution, healthcare and life science professionals accelerate the discovery of how SDoH disparities affect at-risk populations, allowing refined market access strategies for drug manufacturers, new coverage policies from payers, and improved facility outreach and care programs from health services.


The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.

Dataiku Cloud users should follow the instructions for installing solutions on cloud.

  1. The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.

  2. Once the Solution has been added to your space, move ahead to Data Requirements.

After meeting the technical requirements below, self-managed users can install the Solution in one of two ways:

  1. On your Dataiku instance connected to the internet, click + New Project > Dataiku Solutions > Search for Social Determinants of Health.

  2. Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.

Additional note for 12.1+ users

If using a Dataiku 12.1+ instance, and you are missing the technical requirements for this Solution, the popup below will appear to allow admin users to easily install the requirements, or for non-admin users to request installation of code environments and/or plugins on their instance for the Solution.

Admins can processes these requests in the admin request center, after which non-admin users can re-trigger the successful installation of the Solution.

Screenshot of the Request Install of Requirements menu available for Solutions.

Technical Requirements#

To leverage this solution, you must meet the following requirements:

  • Have access to a Dataiku 12.0+* instance.

  • To access the Census Data through an API service, the user is required to generate an API key.

  • A Python 3.8 code environment named solution_soc-determinants-health with the following required packages:


Data Requirements#

  1. This solution calls data from the relevant API interface through live endpoints. There are two sets of data:

  2. You can upload your own community health measure data through a Dataiku Application or directly to the New Health Measure Flow zone and generate the SDoH analysis. The required data schema is specified on the project wiki.

Workflow Overview#

You can follow the solution in the Dataiku gallery.

Dataiku screenshot of the final project Flow showing all Flow zones.

The project has the following high-level steps:

  1. Ingest publicly available data.

  2. Prepare and clean data for analysis.

  3. Apply regression analysis to understand better how social factors are associated with rates of chronic diseases.

  4. Use clustering analysis for insights about areas with undetected/prevalent diseases.

  5. Upload new health measure data and extend the pre-built analysis.

  6. Build and explore solution outputs via easy-to-use dashboards.

  7. Apply rigorous responsible AI ethics for future modeling approaches.



In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.

Building the Full Flow Made Simple#

To ease the usage of this solution, it comes with three pre-built Dashboards. The User Manual - Solution Build provides instructions and scenarios that recursively build all the solution components from data ingestion, processing, regression, segmentation, and visualization.

It is necessary to prepare your data format when uploading new health measures and follow the scenario order for building the analysis at the county or tract level. You should generate an API key into the project variables to recall the public data. The public data in this solution only updates yearly, and scenarios are only required to run then.

The user has the following options for interacting with the solution:

  1. Explore and demo the prepackaged public community-level data.

  2. Upload new data at the county or tract level and automatically build the analysis and visualization deliverables.

Running the first scenario in the User Manual - Solution Build Dashboard will update all the visualizations for the publicly pre-packaged data and restart the webapps backend. Health Equity Webapp, SDoH Analytics and Tract Segmentation and Chronic Disease Prevalence Modeling Dashboards showcase the SDoH analysis for chronic diseases such as diabetes, cancer (except skin), current asthma, and chronic kidney, preventive measures such as COVID-19 vaccination rates and health outcomes such as depression with input records from Census Data - SVI factors, CDC Data and Metadata Flow zones.

Dataiku screenshot of the User Manual Dashboard.

Discover US Community Patterns of Chronic Disease Prevalence and Social Vulnerability#

Health Equity interactive Webapp includes a US map outlining Census counties (or tracts based on individual county regional selection) colored by the selected health measure prevalence, a scatter plot of community-level social vulnerability theme rankings vs. health prevalence rankings, and a table of individual records.

Selecting counties or tracts (depending on filter selection) within the scatter plot via a box or lasso select dynamically displays the corresponding records in the table below from data preprocessing and feature generation Flow zones.

Screenshot of the SDoH Analytics and Tract Segmentation Dashboard Webapp

Two tabs are built into the SDoH Analytics and Tract Segmentation Dashboard. The first tab Census Tract Segmentation enables users to better understand ML-driven tract segmentation solely based on Social Vulnerability percentile values through various model explainability visualizations. The second tab, Tract Segments and Disease, shows how the distribution of tracts by segments corresponds to each disease prevalence.

Screenshot of the Tract Segments and Disease tab

Visualize Associations with Social Vulnerability Factors Across Areas and Populations#

The Chronic Disease Prevalence Modeling Dashboard shows two tabs.



Disease Prevalence Model Summary

Includes a standard webapp where we can select and save a disease for analysis before pressing the Run button to the right. This button will trigger a scenario that activates the Regression Model corresponding to the selected condition and updates the tab with that model’s Summary and Individual Explanations through interactive explainability charts.

Census Tract SHapley Additive exPlanations (SHAP)

Contains two charts that provide insights on how community social factors at a tract level impact that tract’s disease prevalence prediction. Filters can be used to refine the scope of the visualizations.

Screenshot of the Chronic Disease Prevalence Modeling Dashboard.

Plug Your Own Health Measure Community Level Data and Drive Social Determinants of Health#

Explore and extend the pre-built SDoH analysis by uploading new health measure prevalences at the US Census county and/or tract level. There are two possible ways to interact with this solution dynamically.

  • Following the instructions on the User Manual - Solution Build Dashboard, the user may replace health_measure_tract and/or health_measure_county datasets with a format that strictly complies with the required structure(see Dataiku Application Requirements). Similarly, with the pre-built visualizations, the scenarios Tract Check Schema and Build New Data must run subsequently (same for the county). The first one checks that the schema is valid. The latter builds all the necessary Flow zones to support new metrics generation and Regression Analysis for the uploaded health measures. The same scenario restarts the backend of the Health Equity Webapp, which is automatically updated with the new health measure prevalence information.

    Dataiku screenshot of the user instruction for new data upload.
  • Users may directly interact with the Dataiku Application that accesses the SDoH solution and creates a new interactive instance. Like the User Manual Dashboard, the prebuilt analysis is available for demos and exploration. The extended plug & play style functionality enables users to upload new data with minimal Flow interaction for targeted new health measure prevalence analysis with social vulnerability factors.

    Dataiku screenshot of the user interactive application.

Responsible AI Statement#

This solution uses both analytics and ML-driven insights to help drive an understanding of how patterns of social factors that characterize potentially vulnerable populations associate with chronic disease prevalence at regional population levels. Care should always be taken to ensure data considerations are taken into account in any interpretations.

This is community-level survey data, and should not be used to support misleading attribution on how an individual person’s socioeconomic status, minority/ethnic background, and household situation predicts/informs potential disease occurrence or outcomes. Self-reported survey data is particularly subject to recall, social desirability, and non-response bias. Any decisions or actions driven by this analysis must consider these limitations that may influence the distribution of the data.

Moreover, the disease associations relating to regional community-level characteristics should be used to promote and prioritize health equity and therapeutic access as opposed to re-enforcing or deepening disparities or biases in the health and life sciences systems where it is deployed. This solution can (and should) be extended to include additional data such as HCP or pharmacy geolocation information as well as individual-level (de-identified) personal patient behavioral and clinical data in regions identified as areas of potential disparity.

Further models built for designing personalized patient-care journeys, health outreach programs, pricing considerations, or therapeutic delivery should be evaluated with a rigorous responsible AI ethics process to ensure no biases are propagated, all subpopulations are considered, and model interpretability and explainability are in place.

Reproduce these Processes with Minimal Effort#

The intent of this project is to enable healthcare and life science professionals to understand how Dataiku can be used to accelerate the discovery of how SDoH disparities affect at-risk populations. By creating a singular solution that can benefit and influence the decisions of a variety of teams in a single organization or across multiple organizations, immediate insights can be used to refine market access strategies for drug manufacturers, create new coverage policies from payers and improve facility outreach and care programs from health services.

We’ve provided several suggestions on how to use publicly available data and extract actionable insights, but ultimately the “best” approach will depend on your specific needs. If you’re interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.