EN



DOI:  https://doi.org/10.52903/wp2023315

IS COVID-19 REFLECTED IN ANACREDIT DATASET? A BIG DATA - MACHINE LEARNING APPROACH FOR ANALYSING BEHAVIOURAL PATTERNS USING LOAN LEVEL GRANULAR INFORMATION


Anastasios Petropoulos
Bank of Greece

Evangelos Stavroulakis
Bank of Greece

Panagiotis Lazaris
Bank of Greece

Vasilis Siakoulis
Bank of Greece

Nikolaos Vlachogiannakis
Bank of Greece

Abstract

In this study, we explore the impact of COVID-19 pandemic on the default risk of loan portfolios of the Greek banking system, using cutting edge machine learning technologies, like deep learning. Our analysis is based on loan level monthly data, spanning a 42-month period, collected through the ECB AnaCredit database. Our dataset contains more than three million records, including both the pre- and post-pandemic periods. We develop a series of credit rating models implementing state of the art machine learning algorithms. Through an extensive validation process, we explore the best machine learning technique to build a behavioral credit scoring model and subsequently we investigate the estimated sensitivities of various features on predicting default risk. To select the best candidate model, we perform comparisons of the classification accuracy of the proposed methods, in 2-months out-of-time period. Our empirical results indicate that the Deep Neural Networks (DNN) have a superior predictive performance, signalling better generalization capacity against Random Forests, Extreme Gradient Boosting (XGBoost), and logistic regression. The proposed DNN model can accurately simulate the non-linearities caused by the pandemic outbreak on the evolution of default rates for Greek corporate customers. Under this multivariate setup we apply interpretability algorithms to isolate the impact of COVID-19 on the probability of default, controlling for the rest of the features of the DNN. Our results indicate that the impact of the pandemic peaks in the first year, and then it slowly decreases, though without reaching yet the pre COVID-19 levels. Furthermore, our empirical results also suggest different behavioral patterns between Stage 1 and Stage 2 loans, and that default rate sensitivities vary significantly across sectors. The current empirical work can facilitate a more in-depth analysis of AnaCredit database, by providing robust statistical tools for a more effective and responsive micro and macro supervision of credit risk.

Keywords: Credit Risk, Deep Learning, AnaCredit, COVID-19

JEL-classification: G24, C38, C45, C55

Disclaimer: The views expressed on this paper are those of the authors and not of the Bank of Greece.

Correspondence:
Anastasios Petropoulos
Bank of Greece
Amerikis 3, Athens, 102 50
Email: APetropoulos@bankofgreece.gr


Αρχεία


Αυτό το website χρησιμοποιεί cookies για την βελτιστοποίηση της εμπειρίας σας. Μάθετε περισσότερα
Αποδέχομαι