Production
https://prod.org.br/article/doi/10.1590/S0103-65132013005000001
Production
Article

Seleção de variáveis para classificação de bateladas produtivas com base em múltiplos critérios

A multiple criteria-based method for variable selection in industrial applications

Anzanello, Michel José

Downloads: 0
Views: 773

Resumo

Processos industriais são frequentemente descritos por um elevado número de variáveis correlacionadas e ruidosas. Este artigo apresenta um método para seleção das variáveis mais relevantes para classificação de bateladas de produção valendo-se de múltiplos critérios de desempenho (sensibilidade e especificidade). As bateladas são categorizadas em duas classes (conforme ou não conforme, por exemplo). O método utiliza a regressão PLS (Partial Least Squares) para derivar um índice de importância das variáveis de processo. Um procedimento iterativo de classificação das bateladas e eliminação das variáveis é então conduzido. Por fim, uma medida de distância euclidiana ponderada é aplicada para selecionar o melhor subconjunto de variáveis. Ao ser aplicado em dados de processos industriais, o método proposto reteve, em média, 12% das variáveis originais, elevando a sensibilidade em 9%, de 0,78 para 0,85, e a especificidade em 20%, de 0,64 para 0,77. Estudos de simulação permitiram avaliar o desempenho do método frente a cenários distintos.

Palavras-chave

Seleção de variáveis. Múltiplos critérios. Regressão PLS

Abstract

Several correlated and noisy variable are collected from industrial processes. This paper proposes a method for selecting the most relevant process variables aimed at classifying production batches into classes based on multiple criteria (e.g., sensibility and specificity). Production batches are inserted into two classes. The method first applies the PLS regression (Partial Least Squares) on process data and derives a variable importance index. A classification/elimination procedure is then carried out, and a weighted Euclidian distance is generated to identify the recommended variable subset. When applied to the testing set of real industrial data, the proposed method retained average 12% of original variables. The recommended subsets yielded 9% higher sensibility, from 0.78 to 0.85, and 20% higher specificity, from 0.64 to 0.77. Simulation experiments are also performed.

Keywords

Variable selection. Multiple criteria. PLS regression

References



ABDI, H. Partial Least Squares (PLS) Regression. In: LEWIS-BECK, M.; BRYMAN, A.; FUTING, T. (Eds.). Encyclopedia of Social Sciences Research Methods. Thousand Oaks: Sage, 2003.

ANZANELLO, M.; ALBIN, S.; CHAOVALITWONGSE, W. Selecting the best variables for classifying production batches into two quality classes. Chemometrics and Intelligent Laboratory Systems, v. 97, n. 2, p. 111-117, 2009. http://dx.doi.org/10.1016/j.chemolab.2009.03.004

ARAGONÉS-BELTRÁN, P. et al. Valuation of urban industrial land: An analytic network process approach. European Journal of Operational Research, v. 185, p. 322-339, 2008. http://dx.doi.org/10.1016/j.ejor.2006.09.076

CHAOVALITWONGSE, W.; FAN, Y.; SACHDEO, C. On the time series k-nearest neighbor classification of abnormal brain activity. IEEE Transactions on System and Man Cybernetics A, v. 37, p. 1005-1016, 2007. http://dx.doi.org/10.1109/TSMCA.2007.897589

CHONG, I.; ALBIN, S.; JUN, C. A data mining approach to process optimization without an explicit quality function. IIE Transactions, v. 39, p. 795-804, 2007. http://dx.doi.org/10.1080/07408170601142668

DENHAM, M. Choosing the number of factors in partial least square regression: estimating and minimizing the mean squared error of precision. Journal of Chemometrics, v. 14, p. 351-361, 2000. http://dx.doi.org/10.1002/1099-128X(200007/08)14:4%3C351::AID-CEM598%3E3.0.CO;2-Q

DOAN, S.; HORIGUCHI, S. An efficient feature selection using multi-criteria in text categorization. In: Fourth International Conference on Hybrid Intelligent Systems, p. 86-91, 2004. http://dx.doi.org/10.1109/ICHIS.2004.20

GAUCHI, J.; CHAGNON, P. Comparison of selection methods of exploratory variables in PLS regression with application to manufacturing process data. Chemometrics and Intelligent Laboratory Systems, v. 58, p. 171-193, 2001. http://dx.doi.org/10.1016/S0169-7439(01)00158-7

GELADI, P.; KOWALSKI, B. Partial least-squares regression: a tutorial. Analytica Chimica Acta, v. 185, p. 1-17, 1986. http://dx.doi.org/10.1016/0003-2670(86)80028-9

GOLUB, T. et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, v. 286, p. 531-537, 1999. http://dx.doi.org/10.1126/science.286.5439.531

GOUTIS, C. A fast method to compute orthogonal loadings partial least squares. Journal of Chemometrics, v. 11, p. 13-32, 1997. http://dx.doi.org/10.1002/(SICI)1099-128X(199701)11:1%3C33::AID-CEM432%3E3.0.CO;2-C

HORN, J.; NAFPLIOTIS, N.; GOLDBERG, D. A niched pareto genetic algorithm for multiobjective optimization. IEEE World Congress on Computational Intelligence, v. 1, p. 82-87, 1994. Proceedings of the First IEEE Conference on Evolutionary Computation. http://dx.doi.org/10.1109/ICEC.1994.350037

HOSKULDSSON, A. PLS regression methods. Journal of Chemometrics, v. 2, p. 211-228, 1988. http://dx.doi.org/10.1002/cem.1180020306

HOSKULDSSON, A. Variable and subset selection in PLS regression. Chemometrics and Intelligent Laboratory Systems, v. 55, p. 23-38, 2001. http://dx.doi.org/10.1016/S0169-7439(00)00113-1

HUANG, J.; TZENG, G.; ONG, C. Optimal fuzzy multi-criteria expansion of competence sets using multi-objectives evolutionary algorithms. Expert Systems with Applications, v. 30, p. 739-745, 2006. http://dx.doi.org/10.1016/j.eswa.2005.07.033

KETTANEH, N.; BERGLUND, A.; WOLD, S. PCA and PLS in very large datasets. Computational Statistics Data Analysis, v. 48, p. 69-85, 2005. http://dx.doi.org/10.1016/j.csda.2003.11.027

MANNE, R. Analysis of two partial-least-squares algorithms for multivariate calibration. Chemometrics and Intelligent Laboratory Systems, v. 2, p. 187-197, 1987. http://dx.doi.org/10.1016/0169-7439(87)80096-5

MEIRI, R.; ZAHAVI, J. Using simulated annealing to optimize the feature selection problem in marketing applications. European Journal of Operational Research, v. 171, p. 842-858, 2006. http://dx.doi.org/10.1016/j.ejor.2004.09.010

NELSON, P.; MacGREGOR, J.; TAYLOR, P. The impact of missing measurements on PCA and PLS prediction and monitoring applications. Chemometrics and Intelligent Laboratory Systems, v. 80, p. 1-12, 2006. http://dx.doi.org/10.1016/j.chemolab.2005.04.006

OLAFSSON, S.; LI, X.; WU, S. Operations research and data mining. European Journal of Operational Research, v. 187, p. 1429-1448, 2008. http://dx.doi.org/10.1016/j.ejor.2006.09.023

OZTURK, A.; KAYALIGIL, S.; OZDEMIREL, N. Manufacturing lead time estimation using data mining. European Journal of Operational Research, v. 173, p. 683-700, 2006. http://dx.doi.org/10.1016/j.ejor.2005.03.015

PENDARAKI, K.; ZOPOUNIDIS, C.; DOUMPOS, M. On the construction of mutual fund portfolios: A multicriteria methodology and an application to the Greek market of equity mutual funds. European Journal of Operational Research, v. 163, p. 462-481, 2005. http://dx.doi.org/10.1016/j.ejor.2003.10.022

PIRAMUTHU, S. Evaluating feature selection methods for learning in data mining applications. European Journal of Operational Research, v. 156, p. 483-494, 2004. http://dx.doi.org/10.1016/S0377-2217(02)00911-6

RIDGEWAY, G. Strategies and Methods for Prediction. In: YE, N. (Ed.). The handbook of data mining. Lawrence: New Jersey, 2003.

ROSE-PEHRSSON, S. et al. Multi-criteria fire detection systems using a probabilistic neural network. Sensors and Actuators B: Chemical, v. 69, p. 325-335, 2000. http://dx.doi.org/10.1016/S0925-4005(00)00481-0

SUEYOSHI, T. DEA-Discriminant Analysis: Methodological comparison among eight discriminant analysis approaches. European Journal of Operational Research, v. 169, p. 247-272, 2006. http://dx.doi.org/10.1016/j.ejor.2004.05.025

TABOADA, H.; COIT, D. Data clustering of solutions for multiple objective system reliability optimization problems. Quality Technology Quantitative Management Journal, v. 4, p. 35-54, 2007.

TABOADA, H.; COIT, D. Multi-objective scheduling problems: Determination of pruned Pareto sets. IIE Transactions, v. 40, p. 552-564, 2008. http://dx.doi.org/10.1080/07408170701781951

WEISS, S. et al. Maximizing text-mining performance. IEEE Intelligent Systems, v. 14, p. 63-69, 1999. http://dx.doi.org/10.1109/5254.784086

WESTERHUIS, J.; KOURTI, T.; MacGREGOR, J. Analysis of multiblock and hierarquical PCA and PLS models. Journal of Chemometrics, v. 12, p. 301-321, 1998. http://dx.doi.org/10.1002/(SICI)1099-128X(199809/10)12:5%3C301::AID-CEM515%3E3.0.CO;2-S

WOLD, S.; SJOSTROM, M.; ERIKSSON, L. PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, v. 58, p. 109-130, 2001. http://dx.doi.org/10.1016/S0169-7439(01)00155-1

WOLD, W. et al. Some recent developments in PLS modeling. Chemometrics and Intelligent Laboratory Systems, v. 58, p. 131-150, 2001. http://dx.doi.org/10.1016/S0169-7439(01)00156-3

ZITZLER, E.; THIELE, L. Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Transactions on Evolutionary Computation, v. 3, p. 257-271, 1999. http://dx.doi.org/10.1109/4235.797969

ZOPOUNIDIS, C.; DOUMPOS, M. Multicriteria classification and sorting methods: A literature review. European Journal of Operational Research, v. 138, p. 229-246, 2002. http://dx.doi.org/10.1016/S0377-2217(01)00243-0

5883a4357f8c9da00c8b482a 1574685864 Articles
Links & Downloads

Production

Share this page
Page Sections