Production
https://prod.org.br/article/doi/10.1590/0103-6513.20240076
Production
Research Article

Normal nonparametric test for small samples of categorized variables

Jose Luiz Contador; Edson Luiz França Senne; Jose Celso Contador

Downloads: 0
Views: 3

Abstract

Paper aims: Introduce a new statistical test to verify whether two small samples of variable classified into multiple categories are drawn from the same population. This problem can be represented by a contingency table of order (m x 2).

Originality: We do not have adequate asymptotic texts to treat this issue in all instantiation of the problem, and exact methods require substantial computational effort and specialized algorithms. The proposed test covers this gap.

Research method: It can be classified within design science research. The result, as well as the research process, meets the guidelines of that research method.

Main findings: Computational experiments show that the proposed test has similar effectiveness to the exact test, even when dealing with sparse data contingency tables and small values of m. Furthermore, examples show that it can work well in cases where the chi-square test, its numerous variations, and even in situation where the more recently developed methods fail.

Implications for theory and practice: This type of decision problem has received significant attention in the literature because it represents many real-life situations. The test proposed is as useful for small samples as Chi-square is for larger samples.

Keywords

Nonparametric tests, Small samples, Nominal variables, Permutation tests, Computer simulation

References

Aoki, S., & Takemura, A. (2005). Markov chain Monte Carlo exact tests for incomplete two-way contingency tables. Journal of Statistical Computation and Simulation, 75(10), 787-812. http://doi.org/10.1080/00949650410001690079.

Bertrand, J.W.M., & Fransoo, J. (2002). Operations Management Research Methodologies Using Quantitative Modeling. International Journal of Operations & Production Management, 22, 241-264. http://doi.org/10.1108/01443570210414338.

Campbell, B. R. (1976). Partitioning chi-square in contingency tables: a teaching approach. Communications in Statistics. Theory and Methods, 6(6), 553-562. http://doi.org/10.1080/03610927708827513.

Chakraborti, S., & Graham, M. A. (2019). Nonparametric (distribution-free) control charts: an updated overview and some results. Quality Engineering, 31(4), 523-544. http://doi.org/10.1080/08982112.2018.1549330.

Chernoff, H., & Savage, I. R. (1958). Asymptotic normality and efficiency of certain nonparametric tests. Annals of Mathematical Statistics, 29(4), 972-994. http://doi.org/10.1214/aoms/1177706436.

Contador, J. C. (2008). Campos e armas da competição: novo modelo de estratégia. São Paulo: Ed. Saint Paul.

Contador, J. C., Contador, J. L., & Satyro, W. C. (2023). CAC-Redes: a new and quali-quantitative model to increase the competitiveness of companies operating in business networks. Benchmarking, 30(10), 4313-4341. http://doi.org/10.1108/BIJ-03-2022-0204.

Contador, J. L., & Senne, E. L. F. (2016). Testes não paramétricos para pequenas amostras de variáveis não categorizadas: um estudo. Gestão & Produção, 23(3), 588-599. http://doi.org/10.1590/0104-530x357-15.

Diaconis, P., & Sturmfels, B. (1998). Algebraic algorithms for sampling from conditional distributions. Annals of Statistics, 26(1), 363-397. http://doi.org/10.1214/aos/1030563990.

Fisher, R. A. (1970). Statistical methods for research workers (14th ed.). Edinburgh: Oliver and Boyd.

Freeman, G. H., & Halton, J. H. (1951). Note on an exact treatment of contingency goodness-of-fit and other problems of significance. Biometrika, 38(1-2), 141-149. http://doi.org/10.1093/biomet/38.1-2.141. PMid:14848119.

Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200), 675-701. http://doi.org/10.1080/01621459.1937.10503522.

Hamel, G., & Prahalad, C. K. (1995). Competindo pelo Futuro: estratégias inovadoras para obter o controle do seu setor e criar os mercados de amanhã (10. ed.). Rio de Janeiro: Campus.

Hirji, K. F., & Johnson, T. D. (1996). A comparison of algorithms for exact analysis of unordered 2×k contingency tables. Computational Statistics & Data Analysis, 21(4), 419-429. http://doi.org/10.1016/0167-9473(94)00021-2.

Hothorn, T., Hornik, K., van de Wiel, M. A., & Zeileis, A. (2008). Implementing a class of permutation tests: the coin package. Journal of Statistical Software, 28(8), 1-23. http://doi.org/10.18637/jss.v028.i08.

Kim, S.-H., Choi, H., & Lee, S. (2009). Estimate-based goodness-of-fit test for larges parse multinomial distributions. Computational Statistics & Data Analysis, 53(4), 1122-1131. http://doi.org/10.1016/j.csda.2008.10.011.

Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260), 583-621. http://doi.org/10.1080/01621459.1952.10483441.

Lawal, H. B. (1984). Percentile values of the χ2 statistic in small contingency tables. The Indian Journal of Statistics, 46(1), 64-74. Retrieved in 2024, July 6, from https://www.jstor.org/stable/25052326

Lin, J.-J., Chang, C.-H., & Pal, N. (2015). A revisit to contingency table and tests of independence: bootstrap is preferred to chi-square approximations as well as fisher’s exact test. Journal of Biopharmaceutical Statistics, 25(3), 438-458. http://doi.org/10.1080/10543406.2014.920851. PMid:24905809.

Mann, H. B., & Whitney, D. R. (1974). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18(1), 50-60. http://doi.org/10.1214/aoms/1177730491.

Mehta, C. R., & Patel, N. R. (1983). A network algorithm for performing Fisher’s Exact Test in r×c contingency tables. Journal of the American Statistical Association, 78(382), 427-434. http://doi.org/10.2307/2288652.

Plunkett, A., & Park, J. (2019). Two-sample test for sparse high-dimensional multinomial distributions. Test, 28(3), 804-826. http://doi.org/10.1007/s11749-018-0600-8.

Requena, F., & Ciudad, N. M. (2006). A major improvement to the Network Algorithm for Fisher’s Exact Test in (2 x c) contingency tables. Computational Statistics & Data Analysis, 51(2), 490-498. http://doi.org/10.1016/j.csda.2005.09.004.

Siegel, S., & Castellan Junior, N. J. (2006). Estatística não-paramétrica para ciência do comportamento (2. ed.). Porto Alegre: Artmed.

Smirnov, N. V. (1939). On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Moscow University Mathematics Bulletin, 2(2), 3-14.

StatXact. (2003). Software for small-sample categorical and nonparametric data: user manual Version 6. Cambridge: Cytel Software Corporation.

StatXact. (2008). Software for small-sample categorical and nonparametric data. Free Trial. Cambridge: Cytel Software Corporation. Retrieved in 2024, July 6, from http://www.cytel.com/products/statxact/

Strasser, H., & Weber, C. (1999). On the asymptotic theory of permutation statistics. Mathematical Methods of Statistics, 8(2), 220-250. http://doi.org/10.57938/ff565ba0-aa64-4fe0-a158-86fd331bee78.

Subbiah, M., Kumar, B. K., & Srinivasan, M. R. (2008). Bayesian approach to multicenter sparse data. Communications in Statistics. Simulation and Computation, 37(4), 687-696. http://doi.org/10.1080/03610910701884062.

Tanizaki, H. (1997). Power comparison of non-parametric tests: small sample properties from Monte Carlo experiments. Journal of Applied Statistics, 24(5), 603-632. http://doi.org/10.1080/02664769723576.

Wald, A., & Wolfowitz, J. (1940). On a test whether two samples are from the same population. Annals of Mathematical Statistics, 11(2), 147-162. http://doi.org/10.1214/aoms/1177731909.

Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80-83. http://doi.org/10.2307/3001968.

Yang, G., Jiang, W., Yang, Q., & Yu, W. (2015). PBOOST: a GPU-based tool for parallel permutation tests in genome-wide association studies. Bioinformatics, 31(9), 1460-1462. http://doi.org/10.1093/bioinformatics/btu840. PMid:25535244.

Zelterman, D. (1987). Goodness-of-Fit tests for large sparse multinomial distributions. Journal of the American Statistical Association, 82(398), 624-629. http://doi.org/10.1080/01621459.1987.10478475.
 


Submitted date:
07/06/2024

Accepted date:
06/02/2025

689b8cc7a953952b5934d7a4 production Articles
Links & Downloads

Production

Share this page
Page Sections