Research Article

Application of bayesian additive regression trees in the development of credit scoring models in Brazil

Brito Filho, Daniel Alves de; Artes, Rinaldo

Downloads: 0
Views: 443


Abstract: Paper aims: This paper presents a comparison of the performances of the Bayesian additive regression trees (BART), Random Forest (RF) and the logistic regression model (LRM) for the development of credit scoring models.

Originality: It is not usual the use of BART methodology for the analysis of credit scoring data. The database was provided by Serasa-Experian with information regarding direct retail consumer credit operations. The use of credit bureau variables is not usual in academic papers.

Research method: Several models were adjusted and their performances were compared by using regular methods.

Main findings: The analysis confirms the superiority of the BART model over the LRM for the analyzed data. RF was superior to LRM only for the balanced sample. The best-adjusted BART model was superior to RF.

Implications for theory and practice: The paper suggests that the use of BART or RF may bring better results for credit scoring modelling.


Credit, Machine learning, Logistic regression, BART, Random Forest


Abdou, H. A., & Pointon, J. (2011). Credit scoring, statistical techniques and evaluation criteria: a review of the literature. Intelligent Systems in Accounting, Finance & Management, 18(2-3), 59-88.

Abellán, J., & Castellano, J. G. (2017). A comparative study on base classifiers in ensemble methods for credit scoring. Expert Systems with Applications , 73, 1-10.

Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankrupt. The Journal of Finance, 23(4), 589-609.

Anderson, R. (2007). The credit scoring toolkit: theory and practice for retail credit risk management and decision automation. Oxford: Oxford University Press.

Bank for International Settlements – BIS. (2004). Implementation of Basel II: Practical considerations. Basel: Bank for International Settlements. Retrieved in 2018, May 3, from

Bank for International Settlements – BIS. (2006). International convergence of capital measurement and capital standards: a revised framework - comprehensive version . Basel: Bank for International Settlements. Retrieved in 2018, May 3, from

Bequé, A., & Lessmann, S. (2017). Extreme learning machines for credit scoring: an empirical evaluation. Expert Systems with Applications, 86, 42-53.

Bleich, J., Kaperner, A., Geroge, E. I., & Jensen, S. T. (2014). Variable selection for BART: an application to gene regulation. The Annals of Applied Statistics , 8(3), 1750-1781.

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Boca Raton: CRC Press.

Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications , 39(3), 3446-3453.

Carpenter, J., & Bithell, J. (2000). Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statistics in Medicine , 19, 1141-1164.

Chandler, G. G., & Coffman, J. Y. (1979). A comparative analysis of empirical vs. judgmental credit evaluation. Financial Review, 14(4), 23-23.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research , 16, 321-357.

Chipman, H. A., George, E. I., & McCulloch, R. E. (1998). Bayesian CART model search. Journal of the American Statistical Association, 93(443), 935-948.

Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian Additive and Regression Trees. The Annals of Applied Statistics, 4(1), 266-298.

Crook, J., & Bellotti, T. (2010). Time varying and dynamic models for default risk in consumer loans. Journal of the Royal Statistical Society. Series A, (Statistics in Society) , 173(2), 283-305.

Delong, E. R., Delong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 44(3), 837-845.

Desai, V. S., Crook, J. N., & Overstreet, G. A. (1996). A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95(1), 24-37.

Durand, D. (1941). Risk elements in consumer instalment financing. Cambridge: NBER Books.

Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32(2), 407-499.

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179-188.

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5), 1189-1232.

Für, F., Lima, J. D., & Schenatto, F. J. A. (2017). Uma revisão sistemática da literatura sobre credit scoring. In: VII Congresso Brasileiro de Engenharia de Produção (pp. 1-12). Rio de Janeiro: ABREPRO.

Gestel, T. V., Baesens, B., Suykens, J. A. K., Poel, D. V., Baestaens, D. E., & Willekens, M. (2006). Bayesian kernel based classification for financial distress detection. European Journal of Operational Research, 172(3), 979-1003.

Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression . Hoboken: John Wiley & Sons.

Hsieh, N.-C. (2005). Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4), 655-665.

Jain, A. K., Duin, R. P. W., & Mao, J. (2000). Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4-37.

Kapelner, A., & Bleich, J. (2013). Bartmachine: machine learning with bayesian additive regression trees (pp. 1-40). Retrieved in 2017, January 10, from

King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9(2), 137-163.

Kraus, A. (2014). Recent methods from statistics and machine learning for credit scoring (Dissertation). Fakultät für Mathematik, Informatik und Statistik der Ludwig–Maximilians–Universität München, München. Retrieved in 2018, May 3,

Kruppa, J., Schwarzb, A., Armingerb, G., & Ziegler, A. (2013). Consumer credit risk: Individual probability estimates using machine learning. Expert Systems with Applications, 40(13), 5125-5131.

Lee, T.-S., & Chen, I. F. (2005). A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines. Expert Systems with Applications, 28(4), 743-752.

Lensberg, T., Eilifsen, A., & McKee, T. E. (2006). Bankruptcy theory development and classification via genetic programming. European Journal of Operational Research , 169(2), 677-697.

Leong, C. K. (2016). Credit risck scoring with Bayesian network models. Computational Economics, 47(3), 423-446.

Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124-136.

Li, S.-T., Shiue, W., & Huang, M.-H. (2006). The evaluation of consumer loans using support vector machines. Expert Systems with Applications, 30(4), 772-782.

Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18-22.

Louzada, F., Ara, A., & Fernandes, G. B. (2016). Classification methods applied to credit scoring: systematic review and overall comparison. Surveys in Operations Research and Management Science., 21, 117-134.

Malekipirbazari, M., & Aksakalli, V. (2015). Risk assessment in social lending via random forest. Expert Systems with Applications, 42(10), 4621-4631.

Malhotra, R., & Malhotra, D. K. (2002). Differentiating between good credits and bad credits using neuro-fuzzy systems. European Journal of Operational Research , 136(1), 190-211.

Pavlidis, N. G., Tasoulis, D. K., Adams, N. M., & Hand, D. J. (2012). Adaptive consumer credit classification. The Journal of the Operational Research Society , 63(12), 1645-1654.

R Core Team. (2016). R: a language and environment for statistical computing . Vienna: R Foundation for Statistical Computing. Retrieved in 2017, February 10, from

Siddiqi, N. (2012). Credit risk scorecards: developing and implementing intelligent credit scoring. Hoboken: John Wiley & Sons.

Sousa, M. R., Gama, J., & Brandão, E. (2016). A new dynamic modeling framework for credit risk assessment. Expert Systems with Applications, 45(1), 341-351.

Thomas, L. C. (2009). Consumer credit models: pricing, profit and portfolios: pricing, profit and portfolios. Oxford: Oxford University Press.

Thomas, L. C., Oliver, R. W., & Hand, D. J. (2005). A survey of the issues in consumer credit modelling research. The Journal of the Operational Research Society , 56(9), 1006-1015.

Wei, G., Yun-Zhong, C., & Minh-Shu, C. (2014). A new dynamic credit scoring model based on the objective cluster analysis. In Z. Wen, & T. Li (Ed.), Practical applications of intelligent systems (pp. 579-589). New York: Springer Berlin Heidelberg.

West, D., Dellana, S., & Qian, J. (2005). Neural network ensemble strategies for financial decision applications. Computers & Operations Research, 32(10), 2543-2559.

Xia, Y., Liu, B., Wang, S., & Lai, K. K. (2000). A model for portfolio selection with order of expected returns. Computers & Operations Research, 27(5), 409-422.

Yap, B. W., Rani, K. A., Rahman, H. A. A., Fong, S., Khairudin, Z., & Abdullah, N. N. (2014). An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In T. Herawan, M. Deris, & J. Abawajy (Ed.), Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013) (pp. 13-22). Singapore: Springer.

Yeh, C.-C., Lin, F., & Hsu, C.-Y. (2012). A hybrid KVM model, random forests and rough set theory approach for credit rating. Knowledge-Based Systems, 33, 166-172.

Zekic-Susac, M., Sarlija, N., & Bensic, M. (2004). Small business credit scoring: a comparison of logistic regression, neural network, and decision tree models. In Information Technology Interfaces. 26th International Conference on IEEE (pp. 265-270). USA: IEEE.

Zhang, J. L., & Härdle, W. K. (2010). The Bayesian Additive Classification Tree applied to credit risk modelling. Computational Statistics & Data Analysis , 54(5), 1197-1205.

Zhou, L., & Wang, H. (2012). Loan default prediction on large imbalanced data using random forest. TELKOMNIKA Indonesian Journal of Electrical Engineering, 10(6), 1519-1525.

5b86ec160e88258f28e4c89d production Articles
Links & Downloads


Share this page
Page Sections