×

Clustering rows and/or columns of a two-way contingency table and a related distribution theory. (English) Zbl 1454.62022

Summary: The row-wise multiple comparison procedure proposed in [C. Hirotsu, Biometrika 70, 579–589 (1983; Zbl 0534.62036)] has been verified to be useful for clustering rows and/or columns of a contingency table in several applications. Although the method improved the preceding work there was still a gap between the squared distance between the two clusters of rows and the largest root of a Wishart matrix as a reference statistic for evaluating the significance of the clustering. In this paper we extend the squared distance to a generalized squared distance among any number of rows or clusters of rows and dissolves the loss of power in the process of the clustering procedure. If there is a natural ordering in columns we define an order sensitive squared distance and then the reference distribution becomes that of the largest root of a non-orthogonal Wishart matrix, which is very difficult to handle. We therefore propose a very nice \(\chi ^{2}\)-approximation which improves the usual normal approximation in [T. W. Anderson, An introduction to multivariate statistical analysis. 3rd ed. Hoboken, NJ: Wiley (2003; Zbl 1039.62044)] and also the first \(\chi ^{2}\)-approximation introduced in [C. Hirotsu, Biometrika 78, No. 3, 583–594 (1991; Zbl 0778.62062)]. A two-way table reported by L. Guttman [“Measurement as structural theory”, Psychometrika 36, 329–347 (1971; doi:10.1007/BF02291362)] and analyzed by M. J. Greenacre [J. Classif. 5, No. 1, 39–51 (1988; Zbl 0652.62053)] is reanalyzed and a very nice interpretation of the data has been obtained.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H17 Contingency tables
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J15 Paired and multiple comparisons; multiple testing
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Aida, M.; Hirotsu, C., A method for comparing the multinomial distributions under order constraints and the table of percentiles, Jap. J. appl. statist., 12, 101-110, (1983), (in Japanese)
[2] Anderson, T.W., An introduction to multivariate statistical analysis, (2003), Wiley Intersciences New York · Zbl 1039.62044
[3] Gilula, Z., Grouping and association in contingency tables: an exploratory canonical correlation approach, J. amer. statist. assoc., 81, 773-779, (1986) · Zbl 0648.62061
[4] Greenacre, M.J., Clustering the rows and columns of a contingency table, J. classification, 5, 39-51, (1988) · Zbl 0652.62053
[5] Guttman, L., Measurement as structural theory, Psychometrika, 36, 329-347, (1971)
[6] Hirotsu, C., Multiple comparisons and clustering rows in a contingency table, Quality, 7, 27-33, (1977), (in Japanese)
[7] Hirotsu, C., Use of cumulative efficient scares for testing ordered alternatives in discrete models, Biometrika, 69, 565-577, (1982) · Zbl 0494.62057
[8] Hirotsu, C., Defining the pattern of association in two-way contingency tables, Biometrika, 70, 579-589, (1983) · Zbl 0534.62036
[9] Hirotsu, C., An approach to comparing treatments based on repeated measures, Biometrika, 78, 583-594, (1991) · Zbl 0778.62062
[10] Hirotsu, C., Beyond analysis of variance techniques: some applications in clinical trials, Int. statist. rev., 61, 183-201, (1993) · Zbl 0825.62861
[11] Hirotsu, C.; Ohta, E.; Hirose, N.; Shimizu, K., Profile analysis of 24-h measurements of blood pressure, Biometrics, 59, 907-915, (2003) · Zbl 1274.62788
[12] James, A.T., Distribution of matrix variates and latent roots derived from normal samples, Ann. math. statist., 35, 475-501, (1964) · Zbl 0121.36605
[13] James, A.T., Inference on latent roots by calculation of hypergeometric functions of matrix argument, (), pp. 209-235
[14] James, A.T., Calculation of zonal polynomial coefficients by use of the laplace – beltrami operator, Ann. math. statist., 39, 1711-1718, (1968) · Zbl 0177.47406
[15] Sugiura, N., Derivatives of the characteristic root of a symmetric or a Hermitian matrix with two applications in multivariate analysis, Communications in statistics, 1, 393-417, (1973) · Zbl 0259.62047
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.