zbMATH — the first resource for mathematics

Copula analysis of mixture models. (English) Zbl 1304.65087
Summary: Contemporary computers collect databases that can be too large for classical methods to handle. The present work takes data whose observations are distribution functions (rather than the single numerical point value of classical data) and presents a computational statistical approach of a new methodology to group the distributions into classes. The clustering method links the searched partition to the decomposition of mixture densities, through the notions of a function of distributions and of multi-dimensional copulas. The new clustering technique is illustrated by ascertaining distinct temperature and humidity regions for a global climate dataset and shows that the results compare favorably with those obtained from the standard EM algorithm method.

65C60 Computational problems in statistics (MSC2010)
PDF BibTeX Cite
Full Text: DOI
[1] Achard V (1991) Trois Problemes dés de d’Analyse 3D de la Structure Thermodynamique de l’Atmosphére par Satellite: Mesure du Contenu en Ozone; Classification des Masses d’Air; Modélisation Hyper Rapide du Transfert Radiatif. Ph.D. Dissertation, University of Paris
[2] Ali MM, Mikhail NN, Haq MS (1978) A class of bivariate distributions including the bivariate logistic. J Multivar Anal 8: 405–412 · Zbl 0387.62019
[3] Arabie P, Carroll JD (1980) MAPCLUS: a mathematical programming approach to fitting the ADCLUS model. Psychometrika 45: 211–235 · Zbl 0437.62059
[4] Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803–821 · Zbl 0794.62034
[5] Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, Oxford
[6] Bock HH (1998) Clustering and neural networks. In: Rizzi A, Vichi M, Bock HH (eds) Advances in data science and classification. Springer, Berlin, pp 265–277 · Zbl 1051.91523
[7] Bock RD, Gibbons RD (1996) High-dimensional multivariate probit analysis. Biometrics 52: 1183–1194 · Zbl 0925.62193
[8] Brossier G (1990) Piecewise hierarchical clustering. J Classif 7: 197–216 · Zbl 0736.62052
[9] Celeux G, Diday E, Govaert G, Lechevallier Y, Ralambondrainy H (1989) Classification automatique des données. Dunod Informatique, Paris
[10] Celeux G, Diebolt J (1986) L’Algorithme SEM: Un algorithme d’apprentissage probabiliste pour la reconnaissance de mélange de densities. Revue de Statistiques Appliquées 34: 35–51 · Zbl 0607.62037
[11] Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14: 315–332 · Zbl 0937.62605
[12] Celeux G, Govaert G (1993) Comparison of the mixture and the classification maximum likelihood in cluster analysis. J Stat Comput Simul 47: 127–146
[13] Chédin A, Scott N, Wahiche C, Moulinier P (1985) The improved initialization inversion method: a high resolution physical method for temperature retrievals from satellites of tiros-n series. J Appl Meteorol 24: 128–143
[14] Chan JSK, Kuk AYC (1997) Maximum likelihood estimation for probit-linear mixed models with correlated random effects. Biometrics 53: 86–97 · Zbl 1065.62503
[15] Clayton DG (1978) A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65: 141–151 · Zbl 0394.92021
[16] Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39: 1–38 · Zbl 0364.62022
[17] Diday E (1984) Une représentation visuelle des classes empietantes: les pyramides. Rapport de Recherche 291 INRIA
[18] Diday E (2001) A generalization of the mixture decomposition problem in the symbolic data analysis framework. Rapport de Recherche, CEREMADE 112: 1–14
[19] Diday E, Schroeder A, Ok Y (1974) The dynamic clusters method in pattern recognition. In: Proceedings of international federation for information processing congress. Elsevier, New York, pp 691–697
[20] Diday E, Vrac M (2005) Mixture decomposition of distributions by copulas in the symbolic data analysis framework. Discrete Appl Math 147: 27–41 · Zbl 1058.62004
[21] Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis and density estimation. J Am Stat Assoc 97: 611–631 · Zbl 1073.62545
[22] Frank MJ (1979) On the simultaneous associativity of F(x, y) and x + y F(x, y). Aequationes Mathematicae 19: 194–226 · Zbl 0444.39003
[23] Genest C, Ghoudi K (1994) Une famille de lois bidimensionelles insolite. Compte Rendus Academy Sciences Paris I 318: 351–354 · Zbl 0797.60017
[24] Genest C, MacKay J (1986) The joy of copulas: bivariate distributions with uniform marginals. Am Stat 40: 280–283
[25] Genest C, Rivest LP (1993) Statistical inference procedures for bivariate Archimedean copulas. J Am Stat Assoc 88: 1034–1043 · Zbl 0785.62032
[26] Gordon A (1999) Classification. 2nd edn. Chapman and Hall, Boca Raton · Zbl 0929.62068
[27] Hartigan JA, Wong MA (1979) Algorithm AS136. A k-means clustering algorithm. Appl Stat 28: 100–108 · Zbl 0447.62062
[28] Hillali Y (1998) Analyse et modélisation des données probabilistes: Capacités et lois multidimensionelles. Ph.D. Dissertation, University of Paris
[29] Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, New Jersey · Zbl 0665.62061
[30] James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98: 397–408 · Zbl 1041.62052
[31] Kuk AYC, Chan JSK (2001) Three ways of implementing the EM algorithm when parameters are not identifiable. Biometric J 43: 207–218 · Zbl 1152.62302
[32] Li LA, Sedransk N (1988) Mixtures of distributions: a topological approach. Ann Stat 16: 1623–1634 · Zbl 0663.62021
[33] McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York · Zbl 0963.62061
[34] Meng XL, Rubin DB (1991) Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm. J Am Stat Assoc 86: 899–909
[35] Nelsen RB (1999) An introduction to copulas. Springer, New York · Zbl 0909.62052
[36] Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33: 1065–1076 · Zbl 0116.11302
[37] Prakasa Rao BLS (1983) Nonparametric functional estimation. Academic Press, New York · Zbl 0542.62025
[38] Redner RA, Walker H (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26: 195–239 · Zbl 0536.62021
[39] Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc Ser B 59: 731–792 · Zbl 0891.62020
[40] Schroeder A (1976) Analyse d’un mélange de distributions de probabilité de měme type. Revue de Statistiques Appliquées 24: 39–62
[41] Schweizer B, Sklar A (1983) Probabilistic metric spaces. North-Holland, New York · Zbl 0546.60010
[42] Schweizer B (1984) Distributions are the numbers of the future. In: diNola A, Ventre A (eds) Proceedings of the mathematics of fuzzy systems meeting, Naples, Italy. University of Naples, Naples, pp 137–149
[43] Scott AJ, Symons MJ (1971) Clustering methods based on likelihood ratio criteria. Biometrics 27: 387–397
[44] Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London · Zbl 0617.62042
[45] Sklar A (1959) Fonction de répartition a n dimensions et leurs marges. Institute Statistics Université de Paris 8: 229–231
[46] Symons MJ (1981) Clustering criteria and multivariate normal mixtures. Biometrics 37: 35–43 · Zbl 0473.62048
[47] Tanner MA, Wong WH (1987) The calculation of posterior distribution by data augmentation (with discussion). J Am Stat Assoc 82: 528–550
[48] Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
[49] Vrac M (2002) Analyse et modélisation de données probabilistes par decomposition de mélange de copules et application á une base de données climatologiques. Ph.D. Dissertation, University of Paris
[50] Vrac M, Chédin A, Diday E (2005) Clustering a global field of atmospheric profiles by mixture decomposition of copulas. J Atmos Ocean Technol 22: 1445–1459
[51] Wei GCG, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 85: 699–704
[52] Winsberg S, DeSoete G (1999) Latent class models for time series analysis. Appl Stoch Models Bus Ind 15: 183–194 · Zbl 0952.91072
[53] Yakowitz SJ, Spragins LD (1968) On the identifiability of finite mixtures. Ann Math Stat 39: 209–214 · Zbl 0155.25703
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.