Systematic Review

Data mining in occupational safety and health: a systematic mapping and roadmap

Beatriz Lavezo dos Reis; Ana Caroline Francisco da Rosa; Ageu de Araujo Machado; Simone Luzia Santana Sambugaro Wencel; Gislaine Camila Lapasini Leal; Edwin Vladimir Cardoza Galdamez; Rodrigo Clemente Thom de Souza

Downloads: 0
Views: 48


Paper aims: This research presents a literature overview in relation to data mining and machine learning applications in the area of occupational health and safety.

Originality: A summary of main insights obtained from the analysis of systematic mapping is presented at the end, as well as a roadmap with recommendations for directing future research on the topic.

Research method: This article carries out a thorough descriptive research of the scientific literature on the topic through a systematic mapping covering the period between the years 2008 and 2019 and 12 scientific databases, which at the end presents 68 selected records.

Main findings: Around 84% of the selected records were of total significance for the research, with the majority of them being classified in the areas of civil construction and steel industry.

Implications for theory and practice: Through this study it is possible to understand the way research has been developed on this theme, as well as point to the guidelines for future studies. Other contribution is the indication of studies in OSH 4.0 concept, based on monitoring workers full-time.


Machine learning, Safety and health at work, Occupational accidents


Abad, A., Gerassis, S., Saavedra, Á., Giráldez, E., García, J. F., & Taboada, J. (2019). A Bayesian assessment of occupational health surveillance in workers exposed to silica in the energy and construction industry. Environmental Science and Pollution Research International, 26(29), 29560-29569. PMid:30121763.

Afzal, W., Torkar, R., & Feldt, R. (2009). A systematic review of search-based testing for non-functional system properties. Information and Software Technology, 51(6), 957-976.

Akboğa, Ö., & Baradan, S. (2017). Safety in ready mixed concrete industry: descriptive analysis of injuries and development of preventive measures. Industrial Health, 55(1), 54-66. PMid:27524105.

Antwi-Afari, M. F., Li, H., Yu, Y., & Kong, L. (2018). Wearable insole pressure system for automated detection and classification of awkward working postures in construction workers. Automation in Construction, 96, 433-441.

Badri, A., Boudreau-Trudel, B., & Souissi, A. S. (2018). Occupational health and safety in the industry 4.0 era: a cause for major concern? Safety Science, 109, 403-411.

Baghdadi, A. (2018). Application of inertial measurement units for advanced safety surveillance system using individualized sensor technology (ASSIST): a data fusion and machine learning approach. In 2018 IEEE International Conference on Healthcare Informatics (ICHI) (pp. 450-451). New York: IEEE.

Bertke, S. J., Meyers, A. R., Wurzelbacher, S. J., Bell, J., Lampl, M. L., & Robins, D. (2012). Development and evaluation of a Naïve Bayesian model for coding causation of workers’ compensation claims. Journal of Safety Research, 43(5-6), 327-332. PMid:23206504.

Bevilacqua, M., Ciarapica, F. E., & Giacchetta, G. (2008). Industrial and occupational ergonomics in the petrochemical process industry: a regression trees approach. Accident; Analysis and Prevention, 40(4), 1468-1479. PMid:18606280.

Bohanec, M., & Delibašić, B. (2015). Data-mining and expert models for predicting injury risk in ski resorts. Lecture Notes in Business Information Processing, 216, 46-60.

Bonneterre, V., Bicout, D. J., & De Gaudemaris, R. (2012). Application of pharmacovigilance methods in occupational health surveillance: comparison of seven disproportionality metrics. Safety and Health at Work, 3(2), 92-100. PMid:22993712.

Buczak, A. L., & Guven, E. (2016). A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys and Tutorials, 18(2), 1153-1176.

Chen, H., Hou, C., Zhang, L., & Li, S. (2020). Comparative study on the strands of research on the governance model of international occupational safety and health issues. Safety Science, 122, 104513.

Cheng, C.-W., Leu, S.-S., Cheng, Y.-M., Wu, T.-C., & Lin, C.-C. (2012). Applying data mining techniques to explore factors contributing to occupational injuries in Taiwan’s construction industry. Accident; Analysis and Prevention, 48, 214-222. PMid:22664684.

Cheng, C.-W., Lin, C.-C., & Leu, S.-S. (2010). Use of association rules to explore cause-effect relationships in occupational accidents in the Taiwan construction industry. Safety Science, 48(4), 436-444.

Cheng, C.-W., Yao, H.-Q., & Wu, T.-C. (2013). Applying data mining techniques to analyze the causes of major occupational accidents in the petrochemical industry. Journal of Loss Prevention in the Process Industries, 26(6), 1269-1278.

Chokor, A., Naganathan, H., Chong, W. K., & Asmar, M. E. (2016). Analyzing Arizona OSHA injury reports using unsupervised machine learning. Procedia Engineering, 145, 1588-1593.

Ciarapica, F. E., & Giacchetta, G. (2009). Classification and prediction of occupational injury risk using soft computing techniques: An Italian study. Safety Science, 47(1), 36-49.

Comberti, L., Baldissone, G., & Demichela, M. (2015). Workplace accidents analysis with a coupled clustering methods: S.O.M. and K-means algorithms. Chemical Engineering Transactions, 43, 1261-1266.

Comberti, L., Demichela, M., & Baldissone, G. (2018). A combined approach for the analysis of large occupational accident databases to support accident-prevention decision making. Safety Science, 106, 191-202.

Del Pozo-Antúnez, J. J., Ariza-Montes, A., Fernández-Navarro, F., & Molina-Sánchez, H. (2018). Effect of a job demand-control-social support model on accounting professionals’ health perception. International Journal of Environmental Research and Public Health, 15(11), 2437. PMid:30388812.

Di Noia, A., Martino, A., Montanari, P., & Rizzi, A. (2019). Supervised machine learning techniques and genetic optimization for occupational diseases risk prediction. Soft Computing, 24, 4393-4406.

Dybå, T., Dingsøyr, T., & Hanssen, G. K. (2007). Applying systematic reviews to diverse study types: an experience report. In First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007) (pp. 225-234). New York: IEEE.

Fayyad, U., Piatetsky-shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37-54.

Gerassis, S., Martín, J. E., García, J. T., Saavedra, A., & Taboada, J. (2017). Bayesian decision tool for the analysis of occupational accidents in the construction of embankments. Journal of Construction Engineering and Management, 143(2), 04016093.

Goh, Y. M., & Ubeynarayana, C. U. (2017). Construction accident narrative classification: an evaluation of text mining techniques. Accident; Analysis and Prevention, 108, 122-130. PMid:28865927.

Gross, D. P., Zhang, J., Steenstra, I., Barnsley, S., Haws, C., Amell, T., McIntosh, G., Cooper, J., & Zaiane, O. (2013). Development of a computer-based clinical decision support tool for selecting appropriate rehabilitation interventions for injured workers. Journal of Occupational Rehabilitation, 23(4), 597-609. PMid:23468410.

Hajakbari, M. S., & Minaei-Bidgoli, B. (2014). A new scoring system for assessing the risk of occupational accidents: A case study using data mining techniques with Iran’s Ministry of Labor data. Journal of Loss Prevention in the Process Industries, 32, 443-453.

Heo, S.-J., Kim, Y., Yun, S., Lim, S.-S., Kim, J., Nam, C.-M., Park, E.-C., Jung, I., & Yoon, J.-H. (2019). Deep learning algorithms with demographic information help to detect tuberculosis in chest radiographs in annual workers’ health examination data. International Journal of Environmental Research and Public Health, 16(2), 250. PMid:30654560.

Hicks, G., Buttigieg, D., & De Cieri, H. (2016). Safety climate, strain and safety outcomes. Journal of Management & Organization, 22(1), 19-31.

Jiang, H., Cai, Y., Zeng, X., & Huang, M. (2018). Does background really matter? Worker activity recognition in unconstrained construction environment. In 2018 IEEE 15th International Conference on Wearable and Implantable Body Sensor Networks (BSN) (pp. 50-53). New York: IEEE.

Jocelyn, S., Ouali, M.-S., & Chinniah, Y. (2018). Estimation of probability of harm in safety of machinery using an investigation systemic approach and Logical Analysis of Data. Safety Science, 105, 32-45.

Kakhki, F. D., Freeman, S. A., & Mosher, G. A. (2019). Evaluating machine learning performance in predicting injury severity in agribusiness industries. Safety Science, 117, 257-262.

Kang, K., & Ryu, H. (2019). Predicting types of occupational accidents at construction sites in Korea using random forest model. Safety Science, 120, 226-236.

Kao, H., Hosseinmardi, H., Yan, S., Hasan, M., Narayanan, S., Lerman, K., & Ferrara, E. (2018). Discovering latent psychological structures from self-report assessments of hospital workers. In 2018 5th International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC) (pp. 156-161). New York: IEEE.

Keele, S. (2007). Guidelines for performing systematic literature reviews in software engineering: technical report, version 2.3 (EBSE Technical Report). Durham: EBSE.

Khosrowabadi, N., & Ghousi, R. (2019). Decision support approach to occupational safety using data mining. International Journal of Industrial Engineering & Production Research, 30(2), 149-164.

Kitchenham, B. (2004). Procedures for performing systematic reviews (Vol. 33, pp. 1-26). Keele: Keele University.

Kitchenham, B., Budgen, D., & Brereton, P. (2011). Using mapping studies as the basis for further research: a participant-observer case study. Information and Software Technology, 53(6), 638-651.

Krishna, O. B., Maiti, J., Ray, P. K., & Mandal, S. (2015). Assessment of risk of musculoskeletal disorders among crane operators in a steel plant: a data mining-based analysis. Human Factors and Ergonomics in Manufacturing, 25(5), 559-572.

Lee, J., & Kim, H.-R. (2018). Prediction of return-to-original-work after an industrial accident using machine learning and comparison of techniques. Journal of Korean Medical Science, 33(19), e144. PMid:29736160.

Liao, C.-W., & Perng, Y.-H. (2008). Data mining for occupational injuries in the Taiwan construction industry. Safety Science, 46(7), 1091-1102.

Luo, X., Yang, X., Wang, W., Chang, X., Wang, X., & Zhao, Z. (2016). A novel hidden danger prediction method in cloud-based intelligent industrial production management using timeliness managing extreme learning machine. China Communications, 13(7), 74-82.

Marucci-Wellman, H. R., Corns, H. L., & Lehto, M. R. (2017). Classifying injury narratives of large administrative databases for surveillance: a practical approach combining machine learning ensembles and human review. Accident; Analysis and Prevention, 98, 359-371. PMid:27863339.

Meyers, A. R., Al-Tarawneh, I. S., Wurzelbacher, S. J., Bushnell, P. T., Lampl, M. P., Bell, J. L., Bertke, S. J., Robins, D. C., Tseng, C.-Y., Wei, C., Raudabaugh, J. A., & Schnorr, T. M. (2018). Applying machine learning to workers’ compensation data to identify industry-specific ergonomic and safety prevention priorities: Ohio, 2001 to 2011. Journal of Occupational and Environmental Medicine, 60(1), 55-73. PMid:28953071.

Mistikoglu, G., Gerek, I. H., Erdis, E., Mumtaz Usmen, P. E., Cakan, H., & Kazan, E. E. (2015). Decision tree analysis of construction fall accidents involving roofers. Expert Systems with Applications, 42(4), 2256-2263.

Nanda, G., Grattan, K. M., Chu, M. T., Davis, L. K., & Lehto, M. R. (2016). Bayesian decision support for coding occupational injury data. Journal of Safety Research, 57, 71-82. PMid:27178082.

Nenonen, N. (2013). Analysing factors related to slipping, stumbling, and falling accidents at work: Application of data mining methods to Finnish occupational accidents and diseases statistics database. Applied Ergonomics, 44(2), 215-224. PMid:22877702.

Olsen, G. F., Brilliant, S. S., Primeaux, D., & Najarian, K. (2009). Signal processing and machine learning for real-time classification of ergonomic posture with unobtrusive on-body sensors; application in dental practice. In 2009 ICME International Conference on Complex Medical Engineering (pp. 1-11). New York: IEEE.

Palamara, F., Piglione, F., & Piccinini, N. (2011). Self-organizing map and clustering algorithms for the analysis of occupational accident databases. Safety Science, 49(8-9), 1215-1230.

Paliyawan, P., Nukoolkit, C., & Mongkolnam, P. (2014). Office workers syndrome monitoring using kinect. In The 20th Asia-Pacific Conference on Communication (APCC2014) (pp. 58-63). New York: IEEE.

Paternoster, N., Giardino, C., Unterkalmsteiner, M., Gorschek, T., & Abrahamsson, P. (2014). Software development in startup companies: a systematic mapping study. Information and Software Technology, 56(10), 1200-1218.

Pekel, E., Akschir, Z. D., Meto, B., Akleylek, S., & Kilic, E. (2018). A Bayesian network application in occupational health and safety. In 2018 3rd International Conference on Computer Science and Engineering (UBMK) (pp. 239-243). New York: IEEE.

Petersen, K., Vakkalanka, S., & Kuzniarz, L. (2015). Guidelines for conducting systematic mapping studies in software engineering: an update. Information and Software Technology, 64, 1-18.

Qu, Z. (2009). Application of data mining in classification analysis of safety accidents based on alternate covering neural network. In 2009 International Conference on Future BioMedical Information Engineering (FBIE) (pp. 144-147). New York: IEEE.

Rashid, K. M., Datta, S., & Behzadan, A. H. (2017). Coupling risk attitude and motion data mining in a preemtive construction safety framework. In 2017 Winter Simulation Conference (WSC) (pp. 2413-2424). New York: IEEE.

Rubaiyat, A. H. M., Toma, T. T., Kalantari-Khandani, M., Rahman, S. A., Chen, L., Ye, Y., & Pan, C. S. (2016). Automatic detection of helmet uses for construction safety. In 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW) (pp. 135-142). New York: IEEE.

Ruso, J., & Stojanović, V. (2012). Occupational health and safety using data mining. International Journal of Qualitative Research, 6(4), 168-194.

Saâdaoui, F., Bertrand, P. R., Boudet, G., Rouffiac, K., Dutheil, F., & Chamoux, A. (2015). A dimensionally reduced clustering methodology for heterogeneous occupational medicine data mining. IEEE Transactions on Nanobioscience, 14(7), 707-715. PMid:26357403.

Sanchez-Pi, N., Marti, L., Molina, J. M., & Garcia, A. C. B. (2014). An information fusion framework for context-based accidents prevention. In 17th International Conference on Information Fusion (FUSION) (pp. 1-8). New York: IEEE.

Sanmiquel, L., Bascompta, M., Rossell, J. M., Anticoi, H. F., & Guash, E. (2018). Analysis of occupational accidents in underground and surface mining in Spain using data-mining techniques. International Journal of Environmental Research and Public Health, 15(3), 462. PMid:29518921.

Sanmiquel, L., Rossell, J. M., & Vintró, C. (2015). Study of Spanish mining accidents using data mining techniques. Safety Science, 75, 49-55.

Sanni-Anibire, M. O., Mahmoud, A. S., Hassanain, M. A., & Salami, B. A. (2020). A risk assessment approach for enhancing construction safety performance. Safety Science, 121, 15-29.

Sarkar, S., Lodhi, V., & Maiti, J. (2019a). Text-clustering based deep neural network for prediction of occupational accident risk: a case study. In 2018 International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP) (pp. 1-6). New York: IEEE.

Sarkar, S., Pateshwari, V., & Maiti, J. (2017). Predictive model for incident occurrences in steel plant in India. In 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-5). New York: IEEE.

Sarkar, S., Raj, R., Vinay, S., Maiti, J., & Pratihar, D. K. (2019b). An optimization-based decision tree approach for predicting slip-trip-fall accidents at work. Safety Science, 118, 57-69.

Sarkar, S., Verma, A., & Maiti, J. (2018). Prediction of occupational incidents using proactive and reactive data: a data mining approach. In J. Maiti & P. K. Ray (Eds.), Industrial safety management (pp. 65-79). Singapore: Springer.

Sarkar, S., Vinay, S., & Maiti, J. (2016). Text mining based safety risk assessment and prediction of occupational accidents in a steel plant. In 2016 International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT) (pp. 439-444). New York: IEEE.

Sarkar, S., Vinay, S., Raj, R., Maiti, J., & Mitra, P. (2019c). Application of optimized machine learning techniques for prediction of occupational accidents. Computers & Operations Research, 106, 210-224.

Shein, M. M., Hamilton-Wright, A., Black, N., Samson, M., & Lecanelier, M. (2015). Assessing ergonomic and postural data for pain and fatigue markers using machine learning techniques. In 2015 International Conference and Workshop on Computing and Communication (IEMCON) (pp. 1-6). New York: IEEE.

Shin, D.-P., Park, Y.-J., Seo, J., & Lee, D.-E. (2018). Association rules mined from construction accident data. KSCE Journal of Civil Engineering, 22(4), 1027-1039.

Shirali, G. A., Noroozi, M. V., & Malehi, A. S. (2018). Predicting the outcome of occupational accidents by CART and CHAID methods at a steel factory in Iran. Journal of Public Health Research, 7(2), 1361. PMid:30581805.

Siddula, M., Dai, F., Ye, Y., & Fan, J. (2016). Classifying construction site photos for roof detection. Construction Innovation, 16(3), 368-389.

Taylor, J. A., Lacovara, A. V., Smith, G. S., Pandian, R., & Lehto, M. (2014). Near-miss narratives from the fire service: a Bayesian analysis. Accident; Analysis and Prevention, 62, 119-129. PMid:24144497.

Tixier, A. J.-P., Hallowell, M. R., Rajagopalan, B., & Bowman, D. (2017). Construction safety clash detection: identifying safety incompatibilities among fundamental attributes using data mining. Automation in Construction, 74, 39-54.

Tomiazzi, J. S., Judai, M. A., Nai, G. A., Pereira, D. R., Antunes, P. A., & Favareto, A. P. A. (2018). Evaluation of genotoxic effects in Brazilian agricultural workers exposed to pesticides and cigarette smoke using machine-learning algorithms. Environmental Science and Pollution Research International, 25(2), 1259-1269. PMid:29086360.

Tomiazzi, J. S., Pereira, D. R., Judai, M. A., Antunes, P. A., & Favareto, A. P. A. (2019). Performance of machine-learning algorithms to pattern recognition and classification of hearing impairment in Brazilian farmers exposed to pesticide and/or cigarette smoke. Environ. Environmental Science and Pollution Research International, 26(7), 6481-6491. PMid:30623325.

Ueno, K., Hayashi, T., Iwata, K., Honda, N., Kitahara, Y., & Paul, T. K. (2008). Prioritizing health promotion plans with k-bayesian network classifier. In 2008 Seventh International Conference on Machine Learning and Applications (pp. 10-15). New York: IEEE.

Valêncio, C. R., Ichiba, F. T., Medeiros, C. A., & Souza, R. C. G. (2011). Spatial clustering applied to health area. In 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies (pp. 427-432). New York: IEEE.

Waghmare, K., & Pai, A. R. (2013). Analytical study using data mining for periodical medical examination of employees. In Proceedings of International Conference on Advances in Computing (pp. 221-227). New Delhi: Springer Verlag.

Xie, X., & Chang, Z. (2018). Intelligent wearable occupational health safety assurance system of power operation. Journal of Medical Systems, 43(1), 16. PMid:30542831.

Yanar, B., Lay, M., & Smith, P. M. (2019). The interplay between supervisor safety support and occupational health and safety vulnerability on work injury. Safety and Health at Work, 10(2), 172-179. PMid:31297279.

Yoon, S. J., Lin, H. K., Chen, G., Yi, S., Choi, J., & Rui, Z. (2013). Effect of occupational health and safety management system on work-related accident rate and differences of occupational health and safety management system awareness between managers in South Korea’s construction industry. Safety and Health at Work, 4(4), 201-209. PMid:24422176.

Zhao, Y., Li, J., Zhang, M., Lu, Y., Xie, H., Tian, Y., & Qiu, W. (2019). Machine learning models for the hearing impairment prediction in workers exposed to complex industrial noise. Ear and Hearing, 40(3), 690-699. PMid:30142102.

Submitted date:

Accepted date:

616f0450a95395659723d2d3 production Articles
Links & Downloads


Share this page
Page Sections