Concept: Individual Explanations¶
Alongside partial dependence plots and subpopulation analysis, one of the most powerful features in the Explainable AI toolkit is individual prediction explanations.
The Individual explanations tab in DSS allows us to quickly get feature contributions for the most extreme cases.
In addition, we can also calculate individual explanations for every record in a dataset using the Score recipe. Please see the Scoring Data section of the Scoring Basics course to learn more.
These explanations make it easier to communicate the reasons for a specific prediction to business users and allow us to check for any potential biases in the model.
For example, banks often use custom algorithms to determine whether a client will default on a new loan. In the interest of transparency and fairness, banks are required to report the reasons why the client was denied a loan. To provide these answers, banks can use individual explanations to show what features were most important in the prediction for a specific client.
Let’s look closer at how to arrive at these explanations in Dataiku DSS.
After training a prediction model, we can see the tab for Individual Explanations within the Interpretation section of the model report.
Here, we will see the probability distribution of all predictions in the test set. On the left side are those records that have the lowest predicted probabilities, and on the right are those with the highest. Moving the sliders will change the number of records for which explanations are calculated on each end of the probability distribution.
Above the probability distribution, we can choose the number of most influential features to generate in the explanation, as well the method of computation DSS should use. ICE will produce faster computations, while Shapley values may be more accurate. For more information, visit the reference documentation.
By clicking on Compute, DSS will generate explanations for our chosen records. Now we see a list of records underneath the probability distribution.
From the drop down on the right hand side, we can select which column of data is the individual record identifier, such as Encounter ID. On the left side, we will see the record with the lowest predicted probability.
The three most influential features behind the model’s prediction for this record were discharge disposition, number of diagnoses, and number of outpatient visits.
Each feature has a bar showing the direction and magnitude of influence. In this case, all three have a negative effect on the prediction, with discharge disposition having the strongest influence.
In scanning the results, it may become clear that the most influential features are not identical across all records. Seeing these differences across the data gives us an opportunity to investigate potential biases in the model.