Elaboration and analysis of the limit form of a family of statistical association coefficients between relational variables. I.
(Conception et analyse de la forme limite d’une famille de coefficients statistiques d’association entre variables relationnelles. I.)

*(French)*Zbl 0851.62039Summary: This study gives a large synthesis view and prospective on a very general family of association coefficients between descriptive relational variables, that we have elaborated. On the other hand, very accurate technical results are provided. We assume the empirical observations of the descriptive variables on a set \(O\) of elementary objects. A given coefficient is obtained by a statistical normalization of a raw association index with respect to a hypothesis of no relation (or independence). The raw index \(s\) is conceived from a set theoretic representation of the two relational variables to be compared. The case where the two variables associated are unary, provides a clear setting up of the comparison problem.

We particularly analyze the case where the two relations on \(O\), induced by the two descriptive variables to be compared, are binary. The latter case is extremely useful in qualitative data analysis. The normalization of the raw index \(s\) takes into account the distribution of the random raw index \(S\) under an independence hypothesis. The reduction of the “centered” index \([s - E(S)\), where \(E\) denotes the mathematical expectation] is done with the standard deviation \(\sqrt {\text{var} (S)}\). It is a specific expression of the variance \(\text{var}(S)\), which enables to set up the limiting form of an association coefficient under natural asymptotic conditions. Then, we carefully study the very important cases where the descriptive variables are nominal or ordinal qualitative. The limit expression permits to realize the nature of the normalization, from a purely formal point of view. Next, we take up the study of the general case of the comparison of two \(q\)-ary relations. Accurate results are given in the latter context. Finally, we express our current research and its future development more particularly by situating the place of this work in our approach of data analysis by means of hierarchical classification.

We particularly analyze the case where the two relations on \(O\), induced by the two descriptive variables to be compared, are binary. The latter case is extremely useful in qualitative data analysis. The normalization of the raw index \(s\) takes into account the distribution of the random raw index \(S\) under an independence hypothesis. The reduction of the “centered” index \([s - E(S)\), where \(E\) denotes the mathematical expectation] is done with the standard deviation \(\sqrt {\text{var} (S)}\). It is a specific expression of the variance \(\text{var}(S)\), which enables to set up the limiting form of an association coefficient under natural asymptotic conditions. Then, we carefully study the very important cases where the descriptive variables are nominal or ordinal qualitative. The limit expression permits to realize the nature of the normalization, from a purely formal point of view. Next, we take up the study of the general case of the comparison of two \(q\)-ary relations. Accurate results are given in the latter context. Finally, we express our current research and its future development more particularly by situating the place of this work in our approach of data analysis by means of hierarchical classification.

##### MSC:

62H20 | Measures of association (correlation, canonical correlation, etc.) |

62-07 | Data analysis (statistics) (MSC2010) |