Seleção de variáveis para classificação de bateladas produtivas com base em múltiplos critérios
A multiple criteria-based method for variable selection in industrial applications
Anzanello, Michel José
http://dx.doi.org/10.1590/S0103-65132013005000001
Prod, vol.23, n4, p.858-865, 2013
Resumo
Processos industriais são frequentemente descritos por um elevado número de variáveis correlacionadas e ruidosas. Este artigo apresenta um método para seleção das variáveis mais relevantes para classificação de bateladas de produção valendo-se de múltiplos critérios de desempenho (sensibilidade e especificidade). As bateladas são categorizadas em duas classes (conforme ou não conforme, por exemplo). O método utiliza a regressão PLS (Partial Least Squares) para derivar um índice de importância das variáveis de processo. Um procedimento iterativo de classificação das bateladas e eliminação das variáveis é então conduzido. Por fim, uma medida de distância euclidiana ponderada é aplicada para selecionar o melhor subconjunto de variáveis. Ao ser aplicado em dados de processos industriais, o método proposto reteve, em média, 12% das variáveis originais, elevando a sensibilidade em 9%, de 0,78 para 0,85, e a especificidade em 20%, de 0,64 para 0,77. Estudos de simulação permitiram avaliar o desempenho do método frente a cenários distintos.
Palavras-chave
Seleção de variáveis. Múltiplos critérios. Regressão PLS
Abstract
Several correlated and noisy variable are collected from industrial processes. This paper proposes a method for selecting the most relevant process variables aimed at classifying production batches into classes based on multiple criteria (e.g., sensibility and specificity). Production batches are inserted into two classes. The method first applies the PLS regression (Partial Least Squares) on process data and derives a variable importance index. A classification/elimination procedure is then carried out, and a weighted Euclidian distance is generated to identify the recommended variable subset. When applied to the testing set of real industrial data, the proposed method retained average 12% of original variables. The recommended subsets yielded 9% higher sensibility, from 0.78 to 0.85, and 20% higher specificity, from 0.64 to 0.77. Simulation experiments are also performed.
Keywords
Variable selection. Multiple criteria. PLS regression
References
ABDI, H. Partial Least Squares (PLS) Regression. In: LEWIS-BECK, M.; BRYMAN, A.; FUTING, T. (Eds.). Encyclopedia of Social Sciences Research Methods. Thousand Oaks: Sage, 2003.
ANZANELLO, M.; ALBIN, S.; CHAOVALITWONGSE, W. Selecting the best variables for classifying production batches into two quality classes. Chemometrics and Intelligent Laboratory Systems, v. 97, n. 2, p. 111-117, 2009. http://dx.doi.org/10.1016/j.chemolab.2009.03.004
ARAGONÉS-BELTRÁN, P. et al. Valuation of urban industrial land: An analytic network process approach. European Journal of Operational Research, v. 185, p. 322-339, 2008. http://dx.doi.org/10.1016/j.ejor.2006.09.076
CHAOVALITWONGSE, W.; FAN, Y.; SACHDEO, C. On the time series k-nearest neighbor classification of abnormal brain activity. IEEE Transactions on System and Man Cybernetics A, v. 37, p. 1005-1016, 2007. http://dx.doi.org/10.1109/TSMCA.2007.897589
CHONG, I.; ALBIN, S.; JUN, C. A data mining approach to process optimization without an explicit quality function. IIE Transactions, v. 39, p. 795-804, 2007. http://dx.doi.org/10.1080/07408170601142668
DENHAM, M. Choosing the number of factors in partial least square regression: estimating and minimizing the mean squared error of precision. Journal of Chemometrics, v. 14, p. 351-361, 2000. http://dx.doi.org/10.1002/1099-128X(200007/08)14:4%3C351::AID-CEM598%3E3.0.CO;2-Q
DOAN, S.; HORIGUCHI, S. An efficient feature selection using multi-criteria in text categorization. In: Fourth International Conference on Hybrid Intelligent Systems, p. 86-91, 2004. http://dx.doi.org/10.1109/ICHIS.2004.20
GAUCHI, J.; CHAGNON, P. Comparison of selection methods of exploratory variables in PLS regression with application to manufacturing process data. Chemometrics and Intelligent Laboratory Systems, v. 58, p. 171-193, 2001. http://dx.doi.org/10.1016/S0169-7439(01)00158-7
GELADI, P.; KOWALSKI, B. Partial least-squares regression: a tutorial. Analytica Chimica Acta, v. 185, p. 1-17, 1986. http://dx.doi.org/10.1016/0003-2670(86)80028-9
GOLUB, T. et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, v. 286, p. 531-537, 1999. http://dx.doi.org/10.1126/science.286.5439.531
GOUTIS, C. A fast method to compute orthogonal loadings partial least squares. Journal of Chemometrics, v. 11, p. 13-32, 1997. http://dx.doi.org/10.1002/(SICI)1099-128X(199701)11:1%3C33::AID-CEM432%3E3.0.CO;2-C
HORN, J.; NAFPLIOTIS, N.; GOLDBERG, D. A niched pareto genetic algorithm for multiobjective optimization. IEEE World Congress on Computational Intelligence, v. 1, p. 82-87, 1994. Proceedings of the First IEEE Conference on Evolutionary Computation. http://dx.doi.org/10.1109/ICEC.1994.350037
HOSKULDSSON, A. PLS regression methods. Journal of Chemometrics, v. 2, p. 211-228, 1988. http://dx.doi.org/10.1002/cem.1180020306
HOSKULDSSON, A. Variable and subset selection in PLS regression. Chemometrics and Intelligent Laboratory Systems, v. 55, p. 23-38, 2001. http://dx.doi.org/10.1016/S0169-7439(00)00113-1
HUANG, J.; TZENG, G.; ONG, C. Optimal fuzzy multi-criteria expansion of competence sets using multi-objectives evolutionary algorithms. Expert Systems with Applications, v. 30, p. 739-745, 2006. http://dx.doi.org/10.1016/j.eswa.2005.07.033
KETTANEH, N.; BERGLUND, A.; WOLD, S. PCA and PLS in very large datasets. Computational Statistics Data Analysis, v. 48, p. 69-85, 2005. http://dx.doi.org/10.1016/j.csda.2003.11.027
MANNE, R. Analysis of two partial-least-squares algorithms for multivariate calibration. Chemometrics and Intelligent Laboratory Systems, v. 2, p. 187-197, 1987. http://dx.doi.org/10.1016/0169-7439(87)80096-5
MEIRI, R.; ZAHAVI, J. Using simulated annealing to optimize the feature selection problem in marketing applications. European Journal of Operational Research, v. 171, p. 842-858, 2006. http://dx.doi.org/10.1016/j.ejor.2004.09.010
NELSON, P.; MacGREGOR, J.; TAYLOR, P. The impact of missing measurements on PCA and PLS prediction and monitoring applications. Chemometrics and Intelligent Laboratory Systems, v. 80, p. 1-12, 2006. http://dx.doi.org/10.1016/j.chemolab.2005.04.006
OLAFSSON, S.; LI, X.; WU, S. Operations research and data mining. European Journal of Operational Research, v. 187, p. 1429-1448, 2008. http://dx.doi.org/10.1016/j.ejor.2006.09.023
OZTURK, A.; KAYALIGIL, S.; OZDEMIREL, N. Manufacturing lead time estimation using data mining. European Journal of Operational Research, v. 173, p. 683-700, 2006. http://dx.doi.org/10.1016/j.ejor.2005.03.015
PENDARAKI, K.; ZOPOUNIDIS, C.; DOUMPOS, M. On the construction of mutual fund portfolios: A multicriteria methodology and an application to the Greek market of equity mutual funds. European Journal of Operational Research, v. 163, p. 462-481, 2005. http://dx.doi.org/10.1016/j.ejor.2003.10.022
PIRAMUTHU, S. Evaluating feature selection methods for learning in data mining applications. European Journal of Operational Research, v. 156, p. 483-494, 2004. http://dx.doi.org/10.1016/S0377-2217(02)00911-6
RIDGEWAY, G. Strategies and Methods for Prediction. In: YE, N. (Ed.). The handbook of data mining. Lawrence: New Jersey, 2003.
ROSE-PEHRSSON, S. et al. Multi-criteria fire detection systems using a probabilistic neural network. Sensors and Actuators B: Chemical, v. 69, p. 325-335, 2000. http://dx.doi.org/10.1016/S0925-4005(00)00481-0
SUEYOSHI, T. DEA-Discriminant Analysis: Methodological comparison among eight discriminant analysis approaches. European Journal of Operational Research, v. 169, p. 247-272, 2006. http://dx.doi.org/10.1016/j.ejor.2004.05.025
TABOADA, H.; COIT, D. Data clustering of solutions for multiple objective system reliability optimization problems. Quality Technology Quantitative Management Journal, v. 4, p. 35-54, 2007.
TABOADA, H.; COIT, D. Multi-objective scheduling problems: Determination of pruned Pareto sets. IIE Transactions, v. 40, p. 552-564, 2008. http://dx.doi.org/10.1080/07408170701781951
WEISS, S. et al. Maximizing text-mining performance. IEEE Intelligent Systems, v. 14, p. 63-69, 1999. http://dx.doi.org/10.1109/5254.784086
WESTERHUIS, J.; KOURTI, T.; MacGREGOR, J. Analysis of multiblock and hierarquical PCA and PLS models. Journal of Chemometrics, v. 12, p. 301-321, 1998. http://dx.doi.org/10.1002/(SICI)1099-128X(199809/10)12:5%3C301::AID-CEM515%3E3.0.CO;2-S
WOLD, S.; SJOSTROM, M.; ERIKSSON, L. PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, v. 58, p. 109-130, 2001. http://dx.doi.org/10.1016/S0169-7439(01)00155-1
WOLD, W. et al. Some recent developments in PLS modeling. Chemometrics and Intelligent Laboratory Systems, v. 58, p. 131-150, 2001. http://dx.doi.org/10.1016/S0169-7439(01)00156-3
ZITZLER, E.; THIELE, L. Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Transactions on Evolutionary Computation, v. 3, p. 257-271, 1999. http://dx.doi.org/10.1109/4235.797969
ZOPOUNIDIS, C.; DOUMPOS, M. Multicriteria classification and sorting methods: A literature review. European Journal of Operational Research, v. 138, p. 229-246, 2002. http://dx.doi.org/10.1016/S0377-2217(01)00243-0