Model-based approach for household clustering with mixed scale variables. (English) Zbl 1474.62439

Summary: The Ministry of Social Development in Mexico is in charge of creating and assigning social programmes targeting specific needs in the population for the improvement of the quality of life. To better target the social programmes, the Ministry is aimed to find clusters of households with the same needs based on demographic characteristics as well as poverty conditions of the household. Available data consists of continuous, ordinal, and nominal variables, all of which come from a non-i.i.d complex design survey sample. We propose a Bayesian nonparametric mixture model that jointly models a set of latent variables, as in an underlying variable response approach, associated to the observed mixed scale data and accommodates for the different sampling probabilities. The performance of the model is assessed via simulated data. A full analysis of socio-economic conditions in households in the Mexican State of Mexico is presented.


62P25 Applications of statistics to social sciences
62H30 Classification and discrimination; cluster analysis (statistical aspects)
60G55 Point processes (e.g., Poisson, Cox, Hawkes processes)
Full Text: DOI arXiv


[1] Bandyopadhyay, D.; Canale, A., Non-parametric spatial models for clustered ordered periodontal data, J R Stat Soc Ser C, 65, 619-640, (2016)
[2] Banfield, JD; Raftery, AE, Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 803-821, (1993) · Zbl 0794.62034
[3] Barnard, J.; McCulloch, R.; Meng, X-L, Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage, Stat Sin, 10, 1281-1311, (2000) · Zbl 0980.62045
[4] Barrios, E.; Lijoi, A.; Nieto-Barajas, LE; Prünster, I., Modeling with normalized random measure mixture models, Stat Sci, 28, 313-334, (2013) · Zbl 1331.62120
[5] Box, GEP; Cox, DR, An analysis of transformations (with discussion), J R Stat Soc B, 26, 211-252, (1964) · Zbl 0156.40104
[6] Cai, JH; Song, XY; Lam, KH; Ip, EH, A mixture of generalized latent variable models for mixed mode and heterogeneous data, Comput Stat Data Anal, 55, 2889-2907, (2011) · Zbl 1218.62012
[7] Canale, A.; Dunson, DB, Bayesian kernel mixtures for counts, J Am Stat Assoc, 106, 1528-1539, (2011) · Zbl 1233.62041
[8] Canale, A.; Dunson, DB, Bayesian multivariate mixed-scale density estimation, Stat Interface, 8, 195-201, (2015) · Zbl 1405.62037
[9] Canale, A.; Scarpa, B., Bayesian nonparametric location-scale-shape mixtures, Test, 25, 113-130, (2016) · Zbl 1343.62015
[10] Carmona C, Nieto-Barajas LE (2017) Package BNPMIXcluster. R package version 1.2.0
[11] Chambers RL, Skinner CJ (2003) Analysis of survey data. Wiley, Chichester · Zbl 1024.00035
[12] CONEVAL (2009) Metodología para la medición multidimensional de la pobreza en México. Consejo Nacional de Evaluación de la Política de Desarrollo Social, México. http://www.coneval.org.mx/rw/resource/Metodologia_Medicion_Multidimensional.pdf(in Spanish)
[13] Dahl, DB; Vanucci, M. (ed.); Do, K-A (ed.); Müller, P. (ed.), Model-based clustering for expression data via a Dirichlet process mixture model, (2006), Cambridge
[14] Everitt, BS, A finite mixture models for the clustering of mixed-mode data, Stat Probab Lett, 6, 305-309, (1988)
[15] Ferguson, TS, A Bayesian analysis of some nonparametric problems, Ann Stat, 1, 209-230, (1973) · Zbl 0255.62037
[16] Fernández, D.; Arnold, R.; Pledger, S., Mixture-based clustering for the ordered stereotype model, Comput Stat Data Anal, 93, 46-75, (2016) · Zbl 1468.62054
[17] Ishwaran, H.; James, LF, Gibbs sampling methods for stick-breaking priors, J Am Stat Assoc, 96, 161-173, (2001) · Zbl 1014.62006
[18] Kingman, JFC, Random discrete distributions, J R Stat Soc B, 37, 1-22, (1975) · Zbl 0331.62019
[19] Kottas, A.; Müller, P.; Quintana, F., Nonparametric Bayesian modeling for multivariate ordinal data, J Comput Graph Stat, 14, 610-625, (2005)
[20] Leon-Novelo, LG; Zhou, X.; Nebiyou Bekele, B.; Müller, P., Assessing toxicities in a clinical trial: Bayesian inference for ordinal data nested within categories, Biometrics, 66, 966-974, (2010) · Zbl 1202.62156
[21] Lumley T (2010) Complex surveys. Wiley, Hoboken
[22] McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New York · Zbl 0697.62050
[23] McParland, D.; Claire Gormley, I.; McCormick, TH; Clark, SJ; Whiteson Kabudula, C.; Collinson, MA, Clustering South African households based on their asset status using latent variable models, Ann Appl Stat, 8, 747-776, (2014) · Zbl 1454.62503
[24] Navarrete, C.; Quintana, FA; Müller, P., Some issues in nonparametric Bayesian modeling using species sampling models, Stat Model, 8, 3-21, (2008)
[25] Nieto-Barajas, LE; Contreras-Cristán, A., A Bayesian nonparametric approach for time series clustering, Bayesian Anal, 9, 147-170, (2014) · Zbl 1327.62473
[26] Norets, A.; Pelenis, J., Bayesian modeling of joint and conditional distributions, J Econom, 168, 332-346, (2012) · Zbl 1443.62065
[27] Pitman, J., Exchangeable and partially exchangeable random partitions, Probab Theory Relat Fields, 102, 145-158, (1995) · Zbl 0821.60047
[28] Pledger, S.; Arnold, R., Multivariate methods using mixtures: correspondence amalysis, scaling and pattern-detection, Comput Stat Data Anal, 71, 241-261, (2014) · Zbl 1471.62162
[29] Pitman, J.; Yor, M., The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator, Ann Probab, 25, 855-900, (1997) · Zbl 0880.60076
[30] R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/
[31] Ritter, C.; Tanner, MA, Facilitating the Gibbs sampler. The Gibbs stopper and the Griddy-Gibbs sampler, J Am Stat Assoc, 87, 861-868, (1992)
[32] Rodríguez, CE; Walker, SG, Univariate Bayesian nonparametric mixture modeling with unimodal kernels, Stat Comput, 24, 35-49, (2014) · Zbl 1325.62016
[33] Tierney, L., Markov chains for exploring posterior distributions, Ann Stat, 22, 1701-1762, (1994) · Zbl 0829.62080
[34] Wade S, Ghahramani Z (2017) Bayesian cluster analysis: point estimation and credible balls. Bayesian Anal. https://doi.org/10.1214/17-BA1073 · Zbl 1407.62241
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.