Extreme deconvolution: inferring complete distribution functions from noisy, heterogeneous and incomplete observations. (English) Zbl 1223.62029

Summary: We generalize the well-known mixtures of Gaussians approach to density estimation and the accompanying Expectation-Maximization technique for finding the maximum likelihood parameters of the mixture to the case where each data point carries an individual \(d\)-dimensional uncertainty covariance and has unique missing data properties. This algorithm reconstructs the error-deconvolved or “underlying” distribution function common to all samples, even when the individual data points are samples from different distributions, obtained by convolving the underlying distribution with the heteroskedastic uncertainty distribution of the data point and projecting out the missing data directions. We show how this basic algorithm can be extended with conjugate priors on all of the model parameters and a “split-and-merge” procedure designed to avoid local maxima of the likelihood. We demonstrate the full method by applying it to the problem of inferring the three-dimensional velocity distribution of stars near the sun from noisy two-dimensional, transverse velocity measurements from the Hipparcos satellite.


62G07 Density estimation
62F15 Bayesian inference
65C60 Computational problems in statistics (MSC2010)


Full Text: DOI arXiv


[1] Antoja, T., Figueras, F., Fernández, D. and Torra, J. (2008). Origin and evolution of moving groups. I. Characterization in the observational kinematic-age-metallicity space. Astron. Astrophys. 490 135.
[2] Baxter, R. A. (1995). Finding overlapping distributions with MML. Technical Report No. 244, Dept. Computer Science, Monash Univ., Clayton, Australia.
[3] Beal, M. J. (2003). Variational algorithms for approximate Bayesian inference. Ph.D. thesis, Gatsby Computational Neuroscience Unit, Univ. College London.
[4] Binney, J. and Merrifield, M. (1998). Galactic Astronomy . Princeton Univ. Press, Princeton, NJ.
[5] Blaauw, A., Gum, C. S., Pawsey, J. L. and Westerhout, G. (1960). The new IAU system of galactic coordinates (1958 revision). Mon. Not. R. Astron. Soc. 121 123.
[6] Bovy, J. (2010). Tracing the Hercules stream around the galaxy. Astrophys. J. 725 1676.
[7] Bovy, J., Hogg, D. W. and Roweis, S. T. (2009). The velocity distribution of nearby stars from Hipparcos data I. The significance of the moving groups. Astrophys. J. 700 1794.
[8] Bovy, J. and Hogg, D. W. (2010). The velocity distribution of nearby stars from Hipparcos data II. The nature of the low-velocity moving groups. Astrophys. J. 717 617.
[9] Broniatowski, M., Celeux, G. and Diebolt, J. (1983). Reconaissance de Densités par un Algorithme d’Apprentissage Probabiliste. In Data Analysis and Informatics, Vol. 3 359-373. North-Holland, Amsterdam.
[10] Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective , 2nd ed. Chapman and Hall/CRC, Boca Raton, FL. · Zbl 1119.62063
[11] Celeux, G. and Diebolt, J. (1985). The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comput. Statist. Quart 2 73.
[12] Celeux, G. and Diebolt, J. (1986). L’Algorithme SEM: un Algorithme d’Apprentissage Probabiliste pour la Reconnaisance de Mélanges de Densités. Rev. Stat. Appl. 34 35. · Zbl 0607.62037
[13] De Simone, R., Wu, X. and Tremaine, S. (2004). The stellar velocity distribution in the solar neighbourhood. Mon. Not. R. Astron. Soc. 350 627.
[14] Dehnen, W. (1998). The distribution of nearby stars in velocity space inferred from Hipparcos data. Astron. J. 115 2384.
[15] Dehnen, W. (2000). The effect of the outer Lindblad resonance of the galactic bar on the local stellar velocity distribution. Astron. J. 119 800.
[16] Dehnen, W. and Binney, J. J. (1998). Local stellar kinematics from Hipparcos data. Mon. Not. R. Astron. Soc. 298 387.
[17] Delaigle, A. and Meister, A. (2008). Density estimation with heteroscedastic error. Bernoulli 14 562-579. · Zbl 1155.62023 · doi:10.3150/08-BEJ121
[18] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Methodol. Stat. 39 1-38. · Zbl 0364.62022
[19] Diebolt, J. and Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. J. R. Stat. Soc. Ser. B Methodol. Stat. 56 363-375. · Zbl 0796.62028
[20] ESA (1997). The Hipparcos and Tycho Catalogues . ESA SP-1200, Noordwijk.
[21] Famaey, B., Jorissen, A., Luri, X., Mayor, M., Udry, S., Dejonghe, H. and Turon, C. (2005). Local kinematics of K and M giants from CORAVEL/Hipparcos/Tycho-2 data. Revisiting the concept of superclusters. Astron. Astrophys. 430 165.
[22] Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2000). Bayesian Data Analysis . Chapman and Hall/CRC, Boca Raton, FL. · Zbl 1279.62004
[23] Ghahramani, Z. and Beal, M. J. (2000). Variational inference for Bayesian mixtures of factor analysers. In Advances in Neural Information Processing Systems 12 ( S. A. Solla, T. K. Leen and K. R. Muller, eds.) 449. MIT Press, Cambridge, MA.
[24] Ghahramani, Z. and Jordan, M. I. (1994a). Learning from incomplete data. CBCL Technical Report No. 108. Center for Biological and Computational Learning, MIT.
[25] Ghahramani, Z. and Jordan, M. I. (1994b). Supervised learning from incomplete data via an EM approach. In Advances in Neural Information Processing Systems 6 ( J. D. Cowan, G. Tesauro and J. Alspector, eds.) 120-127. Morgan Kaufman, San Francisco.
[26] Helmi, A., White, S. D. M., de Zeeuw, P. T. and Zhao, H. (1999). Debris streams in the solar neighbourhood as relicts from the formation of the milky way. Nature 402 53-55.
[27] Hogg, D. W., Blanton, M. R., Roweis, S. T. and Johnston, K. V. (2005). Modeling complete distributions with incomplete observations: The velocity ellipsoid from Hipparcos data. Astrophys. J. 629 268.
[28] Holmberg, J., Nordström, B. and Andersen, J. (2009). The Geneva-Copenhagen survey of the solar neighbourhood III. Improved distances, ages, and kinematics. Astron. Astrophys. 501 941.
[29] Jasra, A., Holmes, C. C. and Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Statist. Sci. 20 50-67. · Zbl 1100.62032 · doi:10.1214/088342305000000016
[30] MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms . Cambridge Univ. Press, Cambridge. · Zbl 1055.94001
[31] McLachlan, G. J. and Basford, K. (1988). Mixture Models: Inference and Application to Clustering . Dekker, New York. · Zbl 0697.62050
[32] Nordström, B., Mayor, M., Andersen, J., Holmberg, J., Pont, F., Jørgensen, B. R., Olsen, E. H., Udry, S. and Mowlavi, N. (2004). The Geneva-Copenhagen survey of the solar neighbourhood. Ages, metallicities, and kinematic properties of \sim 14 000 F and G dwarfs. Astron. Astrophys. 418 989.
[33] Oliver, J. J., Baxter, R. A. and Wallace, C. S. (1996). Unsupervised learning using MML. In Machine Learning: Proceedings of the Thirteenth International Conference (ICML 96) 364. Morgan Kaufmann, San Francisco.
[34] Ormoneit, D. and Tresp, V. (1996). Improved Gaussian mixture density estimates using Bayesian penalty terms and network averaging. In Advances in Neural Information Processing Systems 8, NIPS, Denver, CO, November 27-30, 1995 ( D. S. Touretzky, M. Mozer and M. E. Hasselmo, eds.) 542-548. MIT Press, Cambridge.
[35] Quillen, A. C. and Minchev, I. (2005). The effect of spiral structure on the stellar velocity distribution in the solar neighborhood. Astron. J. 130 576.
[36] Rabiner, L. and Biing-Hwang, J. (1993). Fundamentals of Speech Recognition . Prentice-Hall, New York. · Zbl 0708.62076
[37] Rasmussen, C. (2000). The infinite Gaussian mixture model. In Advances in Neural Information Processing Systems 12 ( S. A. Solla, T. K. Leen and K. R. Muller, eds.) 554-560. MIT Press, Cambridge.
[38] Richardson, S. and Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components. J. R. Stat. Soc. Ser. B Methodol. Stat. 59 731-792. · Zbl 0891.62020 · doi:10.1111/1467-9868.00095
[39] Rissanen, J. (1978). Modeling by shortest data description. Automatica 14 465. · Zbl 0418.93079 · doi:10.1016/0005-1098(78)90005-5
[40] Roberts, S. J., Husmeier, D., Rezek, I. and Penny, W. (1998). Bayesian approaches to Gaussian mixture modeling. IEEE Trans. Pattern Anal. Mach. Intell. 20 1133.
[41] Schafer, D. W. (1993). Likelihood analysis for probit regression with measurement errors. Biometrika 80 899. · Zbl 0800.62448 · doi:10.1093/biomet/80.4.899
[42] Schafer, D. W. and Purdy, K. G. (1996). Likelihood analysis for errors-in-variables regression with replicate measurements. Biometrika 83 813-824. · Zbl 0882.62063 · doi:10.1093/biomet/83.4.813
[43] Schwartz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[44] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis . Chapman and Hall, Boca Raton, FL. · Zbl 0617.62042
[45] Skuljan, J., Hearnshaw, J. B. and Cottrell, P. L. (1999). Velocity distribution of stars in the solar neighbourhood. Mon. Not. R. Astron. Soc. 308 731.
[46] Staudenmayer, J., Ruppert, D. and Buonaccorsi, J. (2008). Density estimation in the presence of heteroscedastic measurement error. J. Amer. Statist. Assoc. 103 726-736. · Zbl 1471.62319 · doi:10.1198/016214508000000328
[47] Stefanski, L. A. and Carroll, R. J. (1990). Deconvoluting kernel density estimators. Statistics 21 169-184. · Zbl 0697.62035 · doi:10.1080/02331889008802238
[48] Stone, M. (1974). Cross-validation choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B Methodol. Stat. 36 111-147. · Zbl 0308.62063
[49] Ueda, N., Nakano, R., Ghahramani, Z. and Hinton, G. E. (1998). Split and merge EM algorithm for improving Gaussian mixture density estimates. In Neural Networks for Signal Processing VIII, 1998. Proceedings of the 1998 IEEE Signal Processing Society Workshop 274-283. IEEE. · Zbl 0965.68138
[50] van Leeuwen, F. (2007a). Hipparcos, the New Reduction of the Raw Data. Astrophysics and Space Science Library 250 . Springer, Dordrecht.
[51] van Leeuwen, F. (2007b). Validation of the new Hipparcos reduction. Astron. Astrophys. 474 653.
[52] Wallace, C. S. and Boulton, D. M. (1968). An information measure for classification. Comput. J. 11 185. · Zbl 0164.46208 · doi:10.1093/comjnl/11.2.185
[53] Wu, C. F. J. (1983). On the convergence properties of the EM algorithm. Ann. Statist. 11 95-103. · Zbl 0517.62035 · doi:10.1214/aos/1176346060
[54] Zhang, C. H. (1990). Fourier methods for estimating mixing densities and distributions. Ann. Statist. 18 806-831. · Zbl 0778.62037 · doi:10.1214/aos/1176347627
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.