×

Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions. (English) Zbl 1477.62224

Summary: Despite recent methodological advances in hidden Markov regression models and a rapid increase in their application in a wide range of empirical settings, complex clustering-based research questions that include the contribution of the covariates set to the classification and the presence of atypical observations are often addressed ignoring the possible effects of wrong model assumptions. Hidden Markov regression models with random covariates (HMRMRCs) have been recently proposed as an improvement over the classical fixed covariates approach, allowing the covariates to contribute to the underlying clustering structure. To make the approach more flexible, when all the considered random variables are continuous, HMRMRCs are here defined focusing on three multivariate elliptical distributions: the normal (reference distribution), the \(t\), and the contaminated normal. The latter two, heavy-tailed generalizations of the normal distribution, are introduced to protect the reference model for the occurrence of mildly atypical points and also allow us their automatic detection. Identifiability conditions are provided, EM-based algorithms are outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through Monte Carlo experiments with the aim of showing the consequences of wrong model assumptions on paramaters estimates and inferred clustering. Artificial and real data analyses are provided to investigate models behavior in presence of heterogeneity and atypical observations.

MSC:

62M05 Markov processes: estimation; hidden Markov models
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J05 Linear regression; mixed models
62H12 Estimation in multivariate analysis
62F35 Robustness and adaptive procedures (parametric inference)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bartolucci, F.; Farcomeni, A., A multivariate extension of the dynamic logit model for longitudinal data based on a latent Markov heterogeneity structure, J Am Stat Assoc, 104, 816-831 (2009) · Zbl 1388.62158 · doi:10.1198/jasa.2009.0107
[2] Bartolucci, F.; Farcomeni, A.; Pennoni, F., Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates, Test, 23, 3, 433-465 (2014) · Zbl 1305.62299 · doi:10.1007/s11749-014-0381-7
[3] Baum, LE; Petrie, T.; Soules, G.; Weiss, N., A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Ann Math Stat, 41, 1, 164-171 (1970) · Zbl 0188.49603 · doi:10.1214/aoms/1177697196
[4] Bernardi, M.; Maruotti, A.; Petrella, L., Multiple risk measures for multivariate dynamic heavy-tailed models, J Empir Financ, 43, 1-32 (2017) · doi:10.1016/j.jempfin.2017.04.005
[5] Biernacki, C.; Lourme, A., Stable and visualizable Gaussian parsimonious clustering models, Stat Comput, 24, 6, 953-969 (2014) · Zbl 1332.62199 · doi:10.1007/s11222-013-9413-5
[6] Croux, C.; Dehon, C., Estimators of the multiple correlation coefficient: local robustness and confidence intervals, Stat Pap, 44, 3, 315-334 (2003) · Zbl 1052.62028 · doi:10.1007/s00362-003-0158-7
[7] Dang, UJ; Punzo, A.; McNicholas, PD; Ingrassia, S.; Browne, RP, Multivariate response and parsimony for Gaussian cluster-weighted models, J Classif, 34, 1, 4-34 (2017) · Zbl 1364.62149 · doi:10.1007/s00357-017-9221-2
[8] Dannemann, J.; Holzmann, H.; Leister, A., Semiparametric hidden Markov models: identifiability and estimation, Wiley Interdiscip Rev Comput Stat, 6, 6, 418-425 (2014) · doi:10.1002/wics.1326
[9] Hennig, C., Identifiablity of models for clusterwise linear regression, J Classif, 17, 2, 273-296 (2000) · Zbl 1017.62058 · doi:10.1007/s003570000022
[10] Hossain, A.; Naik, DN, A comparative study on detection of influential observations in linear regression, Stat Pap, 32, 1, 55-69 (1991) · doi:10.1007/BF02925479
[11] Ingrassia, S.; Rocci, R., Constrained monotone EM algorithms for finite mixture of multivariate Gaussians, Comput Stat Data Anal, 51, 11, 5339-5351 (2007) · Zbl 1445.62116 · doi:10.1016/j.csda.2006.10.011
[12] Ingrassia, S.; Minotti, SC; Punzo, A., Model-based clustering via linear cluster-weighted models, Comput Stat Data Anal, 71, 159-182 (2014) · Zbl 1471.62095 · doi:10.1016/j.csda.2013.02.012
[13] Lachos, VH; Angolini, T.; Abanto-Valle, CA, On estimation and local influence analysis for measurement errors models under heavy-tailed distributions, Stat Pap, 52, 3, 567-590 (2011) · Zbl 1434.62152 · doi:10.1007/s00362-009-0270-4
[14] Leroux, BG, Maximum-likelihood estimation for hidden Markov models, Stoch Process Their Appl, 40, 1, 127-143 (1992) · Zbl 0738.62081 · doi:10.1016/0304-4149(92)90141-C
[15] Maronna, RA, Robust \({M}\)-estimators of multivariate location and scatter, Ann Stat, 4, 1, 51-67 (1976) · Zbl 0322.62054 · doi:10.1214/aos/1176343347
[16] Martinez-Zarzoso, I.; Maruotti, A., The environmental kuznets curve: functional form, time-varying heterogeneity and outliers in a panel setting, Environmetrics, 24, 7, 461-475 (2013) · Zbl 1525.62177 · doi:10.1002/env.2232
[17] Maruotti, A., Mixed hidden Markov models for longitudinal data: An overview, Int Stat Rev, 79, 3, 427-454 (2011) · Zbl 1238.62094 · doi:10.1111/j.1751-5823.2011.00160.x
[18] Maruotti, A., Robust fitting of hidden Markov regression models under a longitudinal setting, J Stat Comput Simul, 84, 8, 1728-1747 (2014) · Zbl 1453.62618 · doi:10.1080/00949655.2013.763943
[19] Maruotti, A.; Punzo, A., Model-based time-varying clustering of multivariate longitudinal data with covariates and outliers, Comput Stat Data Anal, 113, 475-496 (2017) · Zbl 1464.62128 · doi:10.1016/j.csda.2016.05.024
[20] Maruotti, A.; Bulla, J.; Lagona, F.; Picone, M.; Martella, F., Dynamic mixtures of factor analyzers to characterize multivariate air pollutant exposures, Ann Appl Stat, 11, 3, 1617-1648 (2017) · Zbl 1380.62265 · doi:10.1214/17-AOAS1049
[21] Maruotti, A.; Punzo, A.; Bagnato, L., Hidden Markov and semi-Markov models with multivariate leptokurtic-normal components for robust modeling of daily returns series, J Financ Econom, 17, 1, 91-117 (2019) · doi:10.1093/jjfinec/nby019
[22] Mazza, A.; Punzo, A., Mixtures of multivariate contaminated normal regression models, Stat Pap (2017) · Zbl 1435.62238 · doi:10.1007/s00362-017-0964-y
[23] Mazza, A.; Punzo, A.; Ingrassia, S., flexCWM: a flexible framework for cluster-weighted models, J Stat Softw, 86, 2, 1-30 (2018) · doi:10.18637/jss.v086.i02
[24] McLachlan, G.; Krishnan, T., The EM algorithm and extensions, Wiley Series in Probability and Statistics (2007), New York: Wiley, New York
[25] McLachlan, GJ; Peel, D., Finite mixture models (2000), New York: Wiley, New York · Zbl 0963.62061 · doi:10.1002/0471721182
[26] Meng, XL; Rubin, DB, Maximum likelihood estimation via the ECM algorithm: a general framework, Biometrika, 80, 2, 267-278 (1993) · Zbl 0778.62022 · doi:10.1093/biomet/80.2.267
[27] Niu, X.; Li, P.; Zhang, P., Testing homogeneity in a scale mixture of normal distributions, Stat Pap, 57, 2, 499-516 (2016) · Zbl 1348.62052 · doi:10.1007/s00362-015-0665-3
[28] Punzo, A.; Ingrassia, S.; Morlini, I.; Minerva, T.; Vichi, M., Parsimonious generalized linear Gaussian cluster-weighted models, Advances in statistical models for data analysis, 201-209 (2015), Switzerland: Springer, Switzerland · doi:10.1007/978-3-319-17377-1_21
[29] Punzo, A.; Maruotti, A., Clustering multivariate longitudinal observations: the contaminated Gaussian hidden Markov model, J Comput Graph Stat, 25, 4, 1097-1116 (2016) · doi:10.1080/10618600.2015.1089776
[30] Punzo, A.; McNicholas, PD, Parsimonious mixtures of multivariate contaminated normal distributions, Biom J, 58, 6, 1506-1537 (2016) · Zbl 1353.62124 · doi:10.1002/bimj.201500144
[31] Punzo, A.; McNicholas, PD, Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model, J Classif, 34, 2, 249-293 (2017) · Zbl 1373.62316 · doi:10.1007/s00357-017-9234-x
[32] Punzo, A.; Ingrassia, S.; Maruotti, A., Multivariate generalized hidden Markov regression models with random covariates: physical exercise in an elderly population, Stat Med, 37, 19, 2797-2808 (2018) · doi:10.1002/sim.7687
[33] Punzo, A.; Mazza, A.; McNicholas, PD, ContaminatedMixt: An R package for fitting parsimonious mixtures of multivariate contaminated normal distributions, J Stat Softw, 85, 10, 1-25 (2018) · doi:10.18637/jss.v085.i10
[34] R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
[35] Ritter, G., Robust cluster analysis and variable selection, Chapman & Hall/CRC monographs on statistics & applied probability (2015), Boca Raton: CRC Press, Boca Raton · Zbl 1341.62037
[36] Rousseeuw, PJ; Leroy, AM, Robust regression and outlier detection (2005), Hoboken: Wiley, Hoboken · Zbl 0711.62030
[37] Subedi, S.; Punzo, A.; Ingrassia, S.; McNicholas, PD, Clustering and classification via cluster-weighted factor analyzers, Adv Data Anal Classif, 7, 1, 5-40 (2013) · Zbl 1271.62137 · doi:10.1007/s11634-013-0124-8
[38] Subedi, S.; Punzo, A.; Ingrassia, S.; McNicholas, PD, Cluster-weighted \(t\)-factor analyzers for robust model-based clustering and dimension reduction, Stat Methods Appl, 24, 4, 623-649 (2015) · Zbl 1416.62362 · doi:10.1007/s10260-015-0298-7
[39] Visser, I.; Raijmakers, MEJ; Molenaar, PCM, Confidence intervals for hidden markov model parameters, Br J Math Stat Psychol, 53, 2, 317-327 (2000) · doi:10.1348/000711000159240
[40] Zucchini, W.; MacDonald, IL; Langrock, R., Hidden Markov models for time series: an introduction using R, monographs on statistics & applied probability, (2016), Boca Raton: CRC Press, Boca Raton · Zbl 1362.62005
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.