Erratum to: “The generalized linear mixed cluster-weighted model”.

*(English)*Zbl 1335.62100Summary: Cluster-weighted models (CWMs) are a flexible family of mixture models for fitting the joint distribution of a random vector composed of a response variable and a set of covariates. CWMs act as a convex combination of the products of the marginal distribution of the covariates and the conditional distribution of the response given the covariates. In this paper, we introduce a broad family of CWMs in which the component conditional distributions are assumed to belong to the exponential family and the covariates are allowed to be of mixed-type. Under the assumption of Gaussian covariates, sufficient conditions for model identifiability are provided. Moreover, maximum likelihood parameter estimates are derived using the EM algorithm. Parameter recovery, classification assessment, and performance of some information criteria are investigated through a broad simulation design. An application to real data is finally presented, with the proposed model outperforming other well-established mixture-based approaches.

Erratum to the authors’ paper [ibid. 32, No. 1, 85–113 (2015; Zbl 1331.62310)].

Erratum to the authors’ paper [ibid. 32, No. 1, 85–113 (2015; Zbl 1331.62310)].

##### MSC:

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

62J12 | Generalized linear models (logistic models) |

##### Keywords:

cluster-weighted models; model-based clustering; generalized linear models; mixed-type data
PDF
BibTeX
XML
Cite

\textit{S. Ingrassia} et al., J. Classif. 32, No. 2, 327--355 (2015; Zbl 1335.62100)

Full Text:
DOI

**OpenURL**

##### References:

[1] | AITKEN, A.C. (1926), “On Bernoulli’s Numerical Solution of Algebraic Equations”, in Proceedings of the Royal Society of Edinburgh, Vol. 46, pp. 289-305. · JFM 52.0098.05 |

[2] | AKAIKE, H. (1973), “Information Theory and an Extension of Maximum Likelihood Principle”, in Second International Symposium on Information Theory, eds. B.N. Petrov and F. Csaki, Budapest: Akademiai Kiado, pp. 267-281. · Zbl 0283.62006 |

[3] | BAGNATO, L., and PUNZO, A. (2013), “Finite Mixtures of Unimodal Beta and Gamma Densities and the \(k\)-bumps Algorithm”, Computational Statistics, 28(4), 1571-1597. · Zbl 1306.65024 |

[4] | BAGNATO, L., GRESELIN, F., and PUNZO, A. (2014), “On the Spectral Decomposition in Normal Discriminant Analysis”, Communications in Statistics - Simulation and Computation, 43(6), 1471-1489. · Zbl 1333.62056 |

[5] | BANFIELD, J.D., and RAFTERY, A.E. (1993), “Model-based Gaussian and non-Gaussian Clustering”, Biometrics, 49(3), 803-821. · Zbl 0794.62034 |

[6] | BHATTACHARYYA, A. (1943), “On a Measure of Divergence Between Two Statistical Populations Defined by Their Probability Distributions”, Bulletin of the Calcutta Mathematical Society, 35(4), 99-109. · Zbl 0063.00364 |

[7] | BIERNACKI, C., CELEUX, G., and GOVAERT, G. (2000), “Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7), 719-725. |

[8] | BIERNACKI, C., CELEUX, G., and GOVAERT, G. (2003), “Choosing Starting Values for the EM Algorithm for Getting the Highest Likelihood in Multivariate Gaussian Mixture Models”, Computational Statistics and Data Analysis, 41(3-4), 561-575. · Zbl 1429.62235 |

[9] | BOZDOGAN, H. (1987), “ Model Selection and Akaikes’s Information Criterion (AIC): The General Theory and Its Analytical Extensions”, Psychometrika, 52, 345-370. · Zbl 0627.62005 |

[10] | BOZDOGAN, H. (1994), “Theory & Methodology of Time Series Analysis”, in Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modeling: An Informational Approach (Vol. 1), Dordrecht: Kluwer Academic Publishers. |

[11] | DEMPSTER, A.P., LAIRD, N.M., and RUBIN, D.B. (1977), “Maximum Likelihood From Incomplete Data Via the EM Algorithm”, Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1-38. · Zbl 0364.62022 |

[12] | FOLLMANN, D.A., and LAMBERT, D. (1991), “Identifiability of Finite Mixtures of Logistic Regression Models”, Journal of Statistical Planning and Inference, 27(3), 375-381. · Zbl 0717.62061 |

[13] | FONSECA, J.R.S. (2008), “The Application of Mixture Modeling and Information Criteria for Discovering Patterns of Coronary Heart Disease”, Journal of Applied Quantitative Methods, 3(4), 292-303. |

[14] | FONSECA, J.R.S. (2010), “On the Performance of Information Criteria in Latent Segment Models”, World Academy of Science, Engineering and Technology, 63, 2010. |

[15] | FONSECA, J.R.S., and CARDOSO, M.G.M.S. (2005), “Retail Clients Latent Segments”, in Progress in Artificial Intelligence, Berlin Heidelberg: Springer-Verlag, pp. 348-358. |

[16] | FR ÜHWIRTH-SCHNATTER, S. (2006), Finite Mixture and Markov Switching Models, New York: Springer. · Zbl 1108.62002 |

[17] | GERSHENFELD, N. (1997), “Nonlinear Inference and Cluster-Weighted Modeling”, An nals of the New York Academy of Sciences, 808(1), 18-24. |

[18] | GERSHENFELD, N. (1999), The Nature of Mathematical Modelling, Cambridge: Cambridge University Press. · Zbl 0905.00015 |

[19] | GERSHENFELD, N., SCH ÖNER, B., and METOIS, E. (1999), “Cluster-Weighted Modelling for Time-Series Analysis”, Nature, 397, 329-332. |

[20] | GRESELIN, F., and PUNZO, A. (2013), “Closed Likelihood Ratio Testing Procedures to Assess Similarity of Covariance Matrices”, The American Statistician, 67(3), 117-128. |

[21] | GR ÜN, B., and LEISCH, F. (2008a), “Finite Mixtures of Generalized Linear Regression Models”, in Recent Advances in Linear Models and Related Areas - Essays in Honour of Helge Toutenburg Shalabh, ed. C. Heumann, Heidelberg: Springer Physica Verlag, pp. 205-230. |

[22] | GR ÜN, B., and LEISCH, F. (2008b), “ FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters”, Journal of Statistical Software, 28(4), 1-35. |

[23] | HENNIG, C. (2000), “Identifiablity of Models for Clusterwise Linear Regression”, Journal of Classification, 17(2), 273-296. · Zbl 1017.62058 |

[24] | HENNIG, C., and LIAO, T.F. (2013), “How to Find an Appropriate Clustering for Mixed Type Variables with Application to Socio-Economic Stratification”, Journal of the Royal Statistical Society: Series C (Applied Statistics), 62(3), 1-25. |

[25] | HURVICH, C.M., and TSAI, C.L. (1989), “Regression and Time Series Model Selection in Small Samples”, Biometrika, 76(2), 297-307. · Zbl 0669.62085 |

[26] | HWANG, H., MALHOTRA, N.K., KIM, Y., TOMIUK, M.A., and HONG, S. (2010), “A Comparative Study on Parameter Recovery of Three Approaches to Structural Equation Modeling“, Journal of Marketing Research, 47(4), 699-712. INGRASSIA, S., MINOTTI, S.C., and VITTADINI, G. (2012), “Local Statistical Modeling Via the Cluster-Weighted Approach with Elliptical Distributions”, Journal of Classification, 29(3), 363-401. |

[27] | INGRASSIA, S., MINOTTI, S.C., and PUNZO, A. (2014), “Model-Based Clustering Via Linear Cluster-Weighted Models”, Computational Statistics and Data Analysis, 71, 159-182. · Zbl 06975380 |

[28] | KARLIS, D., and XEKALAKI, E. (2003), “Choosing Initial Values for the EM Algorithm for Finite Mixtures”, Computational Statistics and Data Analysis, 41(3-4), 577-590. · Zbl 1429.62082 |

[29] | MAZZA, A., PUNZO, A., and INGRASSIA, S. (2013), {\bfflexCWM}: Flexible Cluster-Weighted Modeling, available at http://cran.fhcrc.org/web/packages/flexCWM/index.html. |

[30] | MCCULLAGH, P., and NELDER, J.A. (2000), Generalized Linear Models (2nd ed.), Boca Raton: Chapman and Hall. |

[31] | MCLACHLAN, G.J. (1997), “On the EM Algorithm for Overdispersed Count Data”, Statistical Methods in Medical Research, 6(1), 76-98. |

[32] | MCLACHLAN, G.J., and PEEL, D. (2000), Finite Mixture Models, New York: John Wiley and Sons. · Zbl 0963.62061 |

[33] | MCNICHOLAS, P.D., MURPHY, T.B., MCDAID, A.F., and FROST, D. (2010), “Serial and Parallel Implementations of Model-Based Clustering Via Parsimonious Gaussian Mixture Models”, Computational Statistics and Data Analysis, 54(3), 711-723. · Zbl 1464.62131 |

[34] | MCQUARRIE, A., SHUMWAY, R., and TSAI, C.L. (1997), “The Model Selection Criterion AICu”, Statistics and Probability Letters, 34(3), 285-292. · Zbl 1064.62541 |

[35] | PUNZO, A. (2014), “Flexible Mixture Modeling with the Polynomial Gaussian Cluster-Weighted Model”, Statistical Modelling, 14(3), 257-291. |

[36] | R CORE TEAM (2013), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing. |

[37] | SCH ÖNER, B. (2000), “Probabilistic Characterization and Synthesis of Complex Data Driven Systems”, Technical Report, Ph.D. Thesis, MIT, Cambridge. |

[38] | SCH ÖNER, B., and GERSHENFELD, N. (2001), “Cluster Weighted Modeling: Probabilistic Time Series Prediction, Characterization, and Synthesism”, in Nonlinear Dynamics and Statistics, ed. A. Mees, Boston: Birkhauser, pp. 365-385. |

[39] | SCHWARZ, G. (1978), “Estimating the Dimension of a Model”, The Annals of Statistics, 6(2), 461-464. · Zbl 0379.62005 |

[40] | SUBEDI, S., PUNZO, A., INGRASSIA, S., and MCNICHOLAS, P.D. (2013), “Clustering and Classification Via Cluster-Weighted Factor Analyzers”, Advances in Data Analysis and Classification, 7(1), 5-40. · Zbl 1271.62137 |

[41] | TEICHER, H. (1963), “Identifiability of Finite Mixtures”, Annals of Mathematical Statis tics, 34(4), 1265-1269. · Zbl 0137.12704 |

[42] | TITTERINGTON, D.M., SMITH, A.F.M., and MAKOV, U.E. (1985), Statistical Analysis of Finite Mixture Distributions, New York: John Wiley and Sons. · Zbl 0646.62013 |

[43] | TSANAS, A., and XIFARA, A. (2012), “Accurate Quantitative Estimation of Energy Performance of Residential Buildings Using Statistical Machine Learning Tools”, Energy and Buildings, 49, 560-567. |

[44] | VERMUNT, J.K., and MAGIDSON, J. (2002), “Latent Class Cluster Analysis”, in Applied Latent Class Analysis, eds. J.A. Hagenaars and A.L. McCutcheon, Cambridge: Cambridge University Press, pp. 89-106. |

[45] | WANG, P. (1994), “Mixed Regression Models for Discrete Data”, Technical Report, Ph.D. Thesis, University of British Columbia, Vancouver. |

[46] | WANG, P., PUTERMAN, M.L., COCKBURN, M.L., and LE, N.D. (1996), “Mixed Poisson Regression Models with Covariate Dependent Rates”, Biometrics, 52(2), 381-400. · Zbl 0875.62407 |

[47] | WEDEL, M. (2002), “Concomitant Variables in Finite Mixture Models”, Statistica Neerlandica, 56(3), 362-375. · Zbl 1076.62531 |

[48] | WEDEL, M., and DE SARBO, W. (1995), “A Mixture Likelihood Approach for Generalized Linear Models”, Journal of Classification, 12(3), 21-55. · Zbl 0825.62611 |

[49] | WEDEL, M., and KAMAKURA, W.A. (2001), Market Segmentation: Conceptual and Methodological Foundations (2nd ed.), Boston MA: Kluwer Academic Publishers. · Zbl 1293.62261 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.