Solution | Credit Risk Stress Testing (CECL, IFRS9)#
Overview#
Business Case#
Precise modeling of credit loss is a critical regulatory activity for all financial institutions. It requires efficient forecasting for each credit portfolio, rapid aggregation and analysis of the overall impact on the company balance sheet, and guaranteed end-to-end governance.
While being a well established exercise, this work can consume large amounts of effort from Risk and Finance teams, and often runs on a patchwork of systems with outmoded architectures. Whether for regular or ad hoc reviews, this process requires processing of large and varied datasets through complex transformations and iterative modeling steps to obtain robust final figures. The challenges facing this work are well understood: risk managers highlight legacy IT systems (86%) and a lack of easily accessible, high quality data (63%) as core challenges in prior years, with pressure only growing since. Teams face a clear decision point between doubling down on investments in legacy systems, or reorienting to a modernized risk management estate.
Dataiku’s Credit Risk Stress Testing solution supports standardized credit risk portfolio modeling exercises (including CECL & IFRS9) and ad hoc credit stress testing analyses. It offers Risk and Finance teams an opportunity to confidently transition to a modernized data pipeline and modeling approach, delivering improved efficiency and flexibility alongside full governance and auditability. The solution can be scaled across credit portfolios with only a few steps to create comprehensive complete credit exposure modeling with extensive governance.
Installation#
The process to install this solution differs depending on whether you are using Dataiku Cloud or a self-managed instance.
Dataiku Cloud users should follow the instructions for installing solutions on cloud.
The Cloud Launchpad will automatically meet the technical requirements listed below, and add the Solution to your Dataiku instance.
Once the Solution has been added to your space, move ahead to Data Requirements.
After meeting the technical requirements below, self-managed users can install the Solution in one of two ways:
On your Dataiku instance connected to the internet, click + New Project > Dataiku Solutions > Search for Credit Risk Stress Testing (CECL, IFRS9).
Alternatively, download the Solution’s .zip project file, and import it to your Dataiku instance as a new project.
Additional note for 12.1+ users
If using a Dataiku 12.1+ instance, and you are missing the technical requirements for this Solution, the popup below will appear to allow admin users to easily install the requirements, or for non-admin users to request installation of code environments and/or plugins on their instance for the Solution.
Admins can processes these requests in the admin request center, after which non-admin users can re-trigger the successful installation of the Solution.
Technical Requirements#
To leverage this solution, you must meet the following requirements:
Have access to a Dataiku 12.0+* instance.
Install the Time Series Preparation Plugin (if not yet installed).
A Python code environment named
solution_credit-stress-testing
with the following required packages:
kaleido==0.2.1
plotly==5.14.1
scipy==1.7.3
Data Requirements#
The project is initially shipped with all datasets using the filesystem connection.
The mortgage data comes from Freddie Mac. We need the historical performance of the loans to understand how they were affected by economic conditions.
Economic Data both for history and forecast come from The Federal Reserve.
Housing Price Indices by State come from the Federal Housing Finance Agency.
This exhaustive historical data is needed to build the models. However, when running the ECL computation, historical data is no longer needed, the models are applied to the most recent snapshot against the economic scenarios.
Workflow Overview#
You can follow along with the solution in the Dataiku gallery.
The project has the following high-level steps:
Build Probability of Default and Loss Given Default models using both visual and code tools.
Explore your models with pre-built visualizations.
Industrialize the Expected Credit Losses runs through the Project Setup.
Walkthrough#
Note
In addition to reading this document, it is recommended to read the wiki of the project before beginning to get a deeper technical understanding of how this Solution was created and more detailed explanations of Solution-specific vocabulary.
Gather Input Data and Prepare for Training#
Both the Through-the-Cycle (TTC) matrix and the Point-in-Time (PIT) matrices are built using visual recipes along with the $`x`$ density coordinates of each bin, to compute the z-score afterwards.
A Window recipe computes for each loan the credit status for the next period.
Two Group recipes will compute the matrices:
One Group recipe takes care of the TTC matrix by aggregating using credit status and next credit status as keys.
The other handles the PIT by adding the quarter as a key.
The two sets of matrices are stacked back together and then go through a series of visual recipes to compute the probabilities of transitions and their equivalent as $`x`$ density coordinates:
The total UPB is computed for each transition starting point, the key being quarter and credit status.
The probability is computed as the ratio between the transition UPB and the total UPB.
For each transition starting point the cumulative probability is computed.
Cumulative probabilities are converted into the normal density.
The previous bin is retrieved for each row.
Finally, the initial normal density coordinate is set to -Infinity for each transition starting point.
The two types of transition matrices are split back.
In parallel, macroeconomic data is prepared to be joined with the credit data and build the Probability of Default model.
The macroeconomic indicators are the economic time series that relate to the credit cycle. Some of these variables will be used as predictors in the probability of default model.
The economic data also comes with forecast scenarios. They might follow a baseline scenario and a more adverse one like in our example. It is reflected at different levels depending on the variable.
A correlation matrix was built using a Statistics card. It provides a fast and visual way to identify the groups of variables that closely correlated to each other, and eventually have a first view on the variables that are more likely to predict the credit variable z-score. We will add lags on each of the economic variables to find the right delay between the economic observation and the actual credit consequence.
Building a Probability of Default model using Z-score Modeling#
The Z-score is built from the transition matrices, a linear model is trained on the macroeconomic data and forecasts are scored against this model.
First, the TTC transition matrix dataset is joined to the PIT transition matrices datasets. This will be the input of the Z-score extraction.
The Z-score extraction is achieved in the Python recipe:
The objective function for each quarter is defined: this is what \(z\) optimizes.
The global objective function on \(z\) variance is defined: this is what \(\rho\) optimizes.
First \(\rho\) is optimized and saved as a project variable.
Then using this value, \(z\) is optimised and saved in the output dataset.
The extracted Z-score is joined to the prepared macroeconomic data to create the input of the linear regression model that is handled using the visual ML module.
We are looking to fit an ordinary linear regression to predict the Z-score, using some of the macroeconomic variables or their lags that have been previously prepared. To select the variables, we iterate on the feature reduction methods to find the top three variables that are most often relevant. We end up using these three variables:
The BBB Corporate Yield lagged twice.
The Mortgage Interest Rate lagged three times.
The Market Volatility Index lagged once.
This selection can be adapted based on the user’s expertise or the need for the model to be sensitive to some specific features, like inflation. These three features are selected in the feature handling and the final model run using them, without any additional feature reduction.
The trained model is then used to score the forecasts and obtain the predicted Z scores. They are joined to the TTC transition matrix to construct the predicted forecast matrices which are built in the final prepare recipe.
Modeling Loss Given Default using Peak-to-Trough#
The Peak-to-Trough modeling approach for Loss Given Default (LGD) represents a conservative strategy that delves into historical worst-case scenarios for asset value declines. This method involves calculating the most significant difference between consecutive peaks and troughs in asset values. This approach is particularly pertinent as defaults often tend to spike during these trough periods.
These calculations can be performed on multiple levels, including geographical and asset class considerations. For instance, in the context of mortgages, assets can be further categorized into various subgroups like houses and flats, urban and rural properties, etc. In this project, we focused on housing prices at the state level in the United States, revealing significant variations in Peak-to-Trough ratios among different states, as illustrated in the graph below.
House Price indices are available for all states, and upon examination, it becomes evident that certain time series exhibit more significant fluctuations compared to others. These states with more pronounced Peak-to-Trough variations are generally considered to be riskier in terms of housing market stability.
Industrialize your ECL process through a Project Setup#
The project setup lets a user from a monitoring or execution team perform the needed ECL runs. Once models have been approved and delivered, they can be applied to compute ECL through a no-code interface similar to the one designed here.
The ECL Run section lets the user select the cut-off date, the ID of the PD model and the scenarios to run.
A dashboard that provides information about the Data Quality checks and the ECL Results summarizes the output.
The second section contains an interface to select a starting and closing configuration between each a waterfall chart will be built. The [Compute Waterfall](scenario:COMPUTEWATERFALL) scenario will sequentially call the Run Simulation scenario for each needed configuration and results will be compiled in the [waterfall](dataset:waterfall) dataset.
The start configuration is displayed on the left while the closing configuration is on the right.
Both configurations must define a portfolio date, a model and a scenario. Hence, the waterfall chart will compare the two final ECL values through these three potential variations: portfolio, model and scenario.
Reproducing these Processes With Minimal Effort For Your Own Data#
The intent of this project is to enable Risk and Finance teams to understand how Dataiku, can be used to confidently transition to a modernized data pipeline and modeling approach. By creating a solution that can be scaled across credit portfolios with just a few steps create comprehensive and complete credit exposure modeling with extensive governance.
We’ve provided several suggestions on how to use historical loan, mortgage, and housing price data to evaluate credit portfolios but ultimately the “best” approach will depend on your specific needs and your data of interest. If you’re interested in adapting this project to the specific goals and needs of your organization, roll-out and customization services can be offered on demand.