Concept: Subpopulation Analysis

While trying to build a high-performance model, we will have spent considerable time feature engineering, trying different algorithms, and tuning hyperparameters. Now that your overall model’s performance metrics look good, are we finished?

Before pushing our model into production, we might first want to investigate whether the model performs identically across different subpopulations.

If the model is better at predicting outcomes for one group over another, it can lead to biased outcomes and unintended consequences when it is put into production.

Note

We can use subpopulation analysis to check the fairness of a model across various subgroups of interest.

Graphic image describing concept of subpopulation.

A Dataiku prediction model report includes a tab for Subpopulation analysis. In the same way we would do for a partial dependence plot, we need to choose one variable to compute the analysis.

Both categorical and numeric variables can be used to define subpopulations. After selecting a variable, in the first column, we find all of the unique values or modalities in the dataset. The table makes it easy to compare the overall performance of the model to the performance for each subgroup, across various metrics. We can always add more metrics to the default list.

Subpopulation analysis of one variable for an XGBoost model.

If we had chosen a numeric variable instead of a categorical one, we would find the distribution divided into bins. The blue bars represent the percentage of values belonging to that category.

Selecting a specific subpopulation reveals that group’s density chart and confusion matrix for a classification task.

Subpopulation analysis showing a density chart and confusion matrix for one subgroup of the subpopulation.

In our use case, common metrics like ROC AUC, accuracy, precision, and recall appear to be quite close for the largest subgroups. There is no magic button, however, that can reveal whether or not our model is “fair”. It is ultimately up to us to decide what degree of difference between subgroups is meaningful for our use case and how we want to address those differences.