Production
https://prod.org.br/article/doi/10.1590/0103-6513.20170110
Production
Research Article

Application of bayesian additive regression trees in the development of credit scoring models in Brazil

Downloads: 0
Views: 1255
Abstract: Paper aims: This paper presents a comparison of the performances of the Bayesian additive regression trees (BART), Random Forest (RF) and the logistic regression model (LRM) for the development of credit scoring models.

Originality: It is not usual the use of BART methodology for the analysis of credit scoring data. The database was provided by Serasa-Experian with information regarding direct retail consumer credit operations. The use of credit bureau variables is not usual in academic papers.

Research method: Several models were adjusted and their performances were compared by using regular methods.

Main findings: The analysis confirms the superiority of the BART model over the LRM for the analyzed data. RF was superior to LRM only for the balanced sample. The best-adjusted BART model was superior to RF.

Implications for theory and practice: The paper suggests that the use of BART or RF may bring better results for credit scoring modelling.
Credit, Machine learning, Logistic regression, BART, Random Forest

Abdou, H. A., & Pointon, J. (2011). Credit scoring, statistical techniques and evaluation criteria: a review of the literature. Intelligent Systems in Accounting, Finance & Management, 18(2-3), 59-88.

Abellán, J., & Castellano, J. G. (2017). A comparative study on base classifiers in ensemble methods for credit scoring. Expert Systems with Applications , 73, 1-10.

Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankrupt. The Journal of Finance, 23(4), 589-609.

Anderson, R. (2007). The credit scoring toolkit: theory and practice for retail credit risk management and decision automation. Oxford: Oxford University Press.

Bank for International Settlements – BIS. (2004). Implementation of Basel II: Practical considerations. Basel: Bank for International Settlements. Retrieved in 2018, May 3, from https://www.bis.org/publ/bcbs109.htm

Bank for International Settlements – BIS. (2006). International convergence of capital measurement and capital standards: a revised framework - comprehensive version . Basel: Bank for International Settlements. Retrieved in 2018, May 3, from https://www.bis.org/publ/bcbs128.htm

Bequé, A., & Lessmann, S. (2017). Extreme learning machines for credit scoring: an empirical evaluation. Expert Systems with Applications, 86, 42-53.

Bleich, J., Kaperner, A., Geroge, E. I., & Jensen, S. T. (2014). Variable selection for BART: an application to gene regulation. The Annals of Applied Statistics , 8(3), 1750-1781.

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Boca Raton: CRC Press.

Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications , 39(3), 3446-3453.

Carpenter, J., & Bithell, J. (2000). Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statistics in Medicine , 19, 1141-1164.

Chandler, G. G., & Coffman, J. Y. (1979). A comparative analysis of empirical vs. judgmental credit evaluation. Financial Review, 14(4), 23-23.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research , 16, 321-357.

Chipman, H. A., George, E. I., & McCulloch, R. E. (1998). Bayesian CART model search. Journal of the American Statistical Association, 93(443), 935-948.

Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian Additive and Regression Trees. The Annals of Applied Statistics, 4(1), 266-298.

Crook, J., & Bellotti, T. (2010). Time varying and dynamic models for default risk in consumer loans. Journal of the Royal Statistical Society. Series A, (Statistics in Society) , 173(2), 283-305.

Delong, E. R., Delong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44(3), 837-845.

Desai, V. S., Crook, J. N., & Overstreet, G. A. (1996). A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95(1), 24-37.

Durand, D. (1941). Risk elements in consumer instalment financing. Cambridge: NBER Books.

Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32(2), 407-499.

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179-188.

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5), 1189-1232.

Für, F., Lima, J. D., & Schenatto, F. J. A. (2017). Uma revisão sistemática da literatura sobre credit scoring. In: VII Congresso Brasileiro de Engenharia de Produção (pp. 1-12). Rio de Janeiro: ABREPRO.

Gestel, T. V., Baesens, B., Suykens, J. A. K., Poel, D. V., Baestaens, D. E., & Willekens, M. (2006). Bayesian kernel based classification for financial distress detection. European Journal of Operational Research, 172(3), 979-1003.

Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression . Hoboken: John Wiley & Sons.

Hsieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4), 655-665.

Jain, A. K., Duin, R. P. W., & Mao, J. (2000). Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4-37.

Kapelner, A., & Bleich, J. (2013). Bartmachine: machine learning with bayesian additive regression trees (pp. 1-40). Retrieved in 2017, January 10, from https://cran.r-project.org/web/packages/bartMachine/vignettes/bartMachine.pdf

King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9(2), 137-163.

Kraus, A. (2014). Recent methods from statistics and machine learning for credit scoring (Dissertation). Fakultät für Mathematik, Informatik und Statistik der Ludwig–Maximilians–Universität München, München. Retrieved in 2018, May 3, https://edoc.ub.uni-muenchen.de/17143/1/Kraus_Anne.pdf

Kruppa, J., Schwarzb, A., Armingerb, G., & Ziegler, A. (2013). Consumer credit risk: Individual probability estimates using machine learning. Expert Systems with Applications, 40(13), 5125-5131.

Lee, T.-S., & Chen, I. F. (2005). A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Systems with Applications, 28(4), 743-752.

Lensberg, T., Eilifsen, A., & McKee, T. E. (2006). Bankruptcy theory development and classification via genetic programming. European Journal of Operational Research , 169(2), 677-697.

Leong, C. K. (2016). Credit risck scoring with Bayesian network models. Computational Economics, 47(3), 423-446.

Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124-136.

Li, S.-T., Shiue, W., & Huang, M.-H. (2006). The evaluation of consumer loans using support vector machines. Expert Systems with Applications, 30(4), 772-782.

Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18-22.

Louzada, F., Ara, A., & Fernandes, G. B. (2016). Classification methods applied to credit scoring: systematic review and overall comparison. Surveys in Operations Research and Management Science., 21, 117-134.

Malekipirbazari, M., & Aksakalli, V. (2015). Risk assessment in social lending via random forest. Expert Systems with Applications, 42(10), 4621-4631.

Malhotra, R., & Malhotra, D. K. (2002). Differentiating between good credits and bad credits using neuro-fuzzy systems. European Journal of Operational Research , 136(1), 190-211.

Pavlidis, N. G., Tasoulis, D. K., Adams, N. M., & Hand, D. J. (2012). Adaptive consumer credit classification. The Journal of the Operational Research Society , 63(12), 1645-1654.

R Core Team. (2016). R: a language and environment for statistical computing . Vienna: R Foundation for Statistical Computing. Retrieved in 2017, February 10, from https://www.R-project.org/

Siddiqi, N. (2012). Credit risk scorecards: developing and implementing intelligent credit scoring. Hoboken: John Wiley & Sons.

Sousa, M. R., Gama, J., & Brandão, E. (2016). A new dynamic modeling framework for credit risk assessment. Expert Systems with Applications, 45(1), 341-351.

Thomas, L. C. (2009). Consumer credit models: pricing, profit and portfolios: pricing, profit and portfolios. Oxford: Oxford University Press.

Thomas, L. C., Oliver, R. W., & Hand, D. J. (2005). A survey of the issues in consumer credit modelling research. The Journal of the Operational Research Society , 56(9), 1006-1015.

Wei, G., Yun-Zhong, C., & Minh-Shu, C. (2014). A new dynamic credit scoring model based on the objective cluster analysis. In Z. Wen, & T. Li (Ed.), Practical applications of intelligent systems (pp. 579-589). New York: Springer Berlin Heidelberg.

West, D., Dellana, S., & Qian, J. (2005). Neural network ensemble strategies for financial decision applications. Computers & Operations Research, 32(10), 2543-2559.

Xia, Y., Liu, B., Wang, S., & Lai, K. K. (2000). A model for portfolio selection with order of expected returns. Computers & Operations Research, 27(5), 409-422.

Yap, B. W., Rani, K. A., Rahman, H. A. A., Fong, S., Khairudin, Z., & Abdullah, N. N. (2014). An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In T. Herawan, M. Deris, & J. Abawajy (Ed.), Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013) (pp. 13-22). Singapore: Springer.

Yeh, C.-C., Lin, F., & Hsu, C.-Y. (2012). A hybrid KVM model, random forests and rough set theory approach for credit rating. Knowledge-Based Systems, 33, 166-172.

Zekic-Susac, M., Sarlija, N., & Bensic, M. (2004). Small business credit scoring: a comparison of logistic regression, neural network, and decision tree models. In Information Technology Interfaces. 26th International Conference on IEEE (pp. 265-270). USA: IEEE.

Zhang, J. L., & Härdle, W. K. (2010). The Bayesian Additive Classification Tree applied to credit risk modelling. Computational Statistics & Data Analysis , 54(5), 1197-1205.

Zhou, L., & Wang, H. (2012). Loan default prediction on large imbalanced data using random forest. TELKOMNIKA Indonesian Journal of Electrical Engineering, 10(6), 1519-1525.
 

5b86ec160e88258f28e4c89d production Articles
Links & Downloads

Production

Share this page
Page Sections