Measures, models, and graphical displays in the analysis of cross-classified data (with comments). (English) Zbl 0850.62093

The author examines alternative characterizations of departures from independence in the context of \(I\times J\) tables. Let \(P_{ij}\) be the probability for cell \((i,j)\), with row, column and grand totals being denoted by \(P_{i+},P_{+j}\) and \(P_{++}\), respectively. Let \(G_{ij}\) denote the natural logarithm of \(P_{ij}\) and write \(\widetilde G_{i\bullet}=\sum G_{ij}\mu_j\), \(\widetilde G_{\bullet j}=\sum G_{ij}\nu_i\), \(\widetilde G_{\bullet\bullet}=\sum\sum G_{ij}\mu_j\nu_i\).
Let \(\tilde\lambda_{ij}=G_{ij}-\widetilde G_{i\bullet}-\widetilde G_{\bullet j}+\widetilde G_{\bullet\bullet}\). The author terms this quantity the weighted log-linear interaction and uses \(\mu_j=P_{+j}\) and \(\nu_i=P_{i+}\) in his exposition. If, instead, \(\mu_j=1/J\) and \(\nu_i=1/I\) then the usual unweighted interaction \(\lambda_{ij}\) is obtained. A third quantity of interest is \(\Delta_{ij}\), which is termed the relative difference and is defined by \(\Delta_{ij}=(P_{ij}-P_{i+}P_{+j})/P_{i+}P_{+j}\). For a \(2\times 2\) table \(\Delta=|\rho|\), where \(\rho\) is the correlation coefficient. The author demonstrates with simple \(2\times 2\) and \(3\times 3\) tables that data sets may display identical odds-ratios and differing correlations, and vice versa. These two quantities lead to the following two families of models: the correlation family \(P_{ij}=P_{i+}P_{+j}(1+\sum^M_{m=1}\rho_mx_{im}y_{jm})\) and the weighted or unweighted association models typified by \(P_{ij}=\alpha_i\beta_j\exp(\sum^M_{m=1}\varphi_m\mu_{im}\nu_{jm})\). For the special case where \(M=1\) (one-component models) the latter forms the familiar RC models, and with equi-spaced row and column scores the still simpler uniform association model, \(U\).
The author demonstrates links between the weighted association models and the correlation models and relates the one-component association models to an underlying bivariate normal. He analyzes three \(I\times J\) tables having ordered categories and demonstrates that in each case the RC model or a simpler variant provides an excellent fit despite the large sample sizes (up to 25000) involved. Simple diagrams showing the estimated row and column parameters are provided.
The paper ends with a consideration of extensions of the quasi-symmetry model to asymmetric models that include diagonal and triangular components.
There follows a series of invited comments from E. B. Andersen, J. P. Benzécri, A. Baccini, H. Caussinus and A. de Falguerolles, C. Clogg and C. R. Rao, and D. R. Cox and S. J. Haberman followed by a lengthy reply by the author who concentrates principally on a comparison of the similarities and differences between the RC models and the correspondence analysis approach of Benzécri.


62-07 Data analysis (statistics) (MSC2010)
62A09 Graphical methods in statistics
62H17 Contingency tables
Full Text: DOI