Mixture of hidden Markov models for accelerometer data. (English) Zbl 07483748

Summary: Motivated by the analysis of accelerometer data taken across a population of individuals, we introduce a specific finite mixture of hidden Markov models with particular characteristics that adapt well to the specific nature of this type of longitudinal data. Our model allows for the computation of statistics that characterize the physical activity of a subject (e.g., the mean time spent at different activity levels and the probability of the transition between two activity levels) without specifying the activity levels in advance but by estimating them from the data. In addition, this approach allows the heterogeneity of the population to be taken into account and subpopulations with homogeneous physical activity behavior to be defined. We prove that, under mild assumptions, this model implies that the probability of misclassifying a subject decreases at an exponential decay with the length of its measurement sequence. Model identifiability is also investigated. We also report a comprehensive suite of numerical simulations to support our theoretical findings. The method is motivated by and applied to the Physical Activity and Transit Survey.


62Pxx Applications of statistics


seqHMM; LMest; MHMM; mHMMbayes
Full Text: DOI arXiv


[1] Aarts, E. (2019). mHMMbayes: Multilevel hidden Markov models using Bayesian estimation. R package version 0.1.1.
[2] Allman, E. S., Matias, C. and Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. Ann. Statist. 37 3099-3132. · Zbl 1191.62003
[3] Altman, R. M. (2007). Mixed hidden Markov models: An extension of the hidden Markov model to the longitudinal data setting. J. Amer. Statist. Assoc. 102 201-210. · Zbl 1284.62803
[4] Bai, J., Sun, Y., Schrack, J. A., Crainiceanu, C. M. and Wang, M.-C. (2018). A two-stage model for wearable device data. Biometrics 74 744-752. · Zbl 1414.62430
[5] Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics 49 803-821. · Zbl 0794.62034
[6] Bartolucci, F., Farcomeni, A. and Pennoni, F. (2013). Latent Markov Models for Longitudinal Data. Statistics in the Social and Behavioral Sciences Series. CRC Press, Boca Raton, FL. · Zbl 1341.62002
[7] Bartolucci, F., Pandolfi, S. and Pennoni, F. (2017). Lmest: An R package for latent Markov models for longitudinal categorical data. J. Stat. Softw. 81 1-38.
[8] Bartolucci, F., Pennoni, F. and Vittadini, G. (2011). Assessment of school performance through a multilevel latent Markov Rasch model. J. Educ. Behav. Stat. 36 491-522.
[9] Biernacki, C., Celeux, G. and Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22 719-725.
[10] Brault, V. and Mariadassou, M. (2015). Co-clustering through latent bloc model: A review. J. SFdS 156 120-139. · Zbl 1341.62172
[11] Cappé, O., Moulines, E. and Rydén, T. (2005). Inference in Hidden Markov Models. Springer Series in Statistics. Springer, New York. With Randal Douc’s contributions to Chapter 9 and Christian P. Robert’s to Chapters 6, 7 and 13, With Chapter 14 by Gersende Fort, Philippe Soulier and Moulines, and Chapter 15 by Stéphane Boucheron and Elisabeth Gassiat.
[12] Celeux, G. and Durand, J.-B. (2008). Selecting hidden Markov model state number with cross-validated likelihood. Comput. Statist. 23 541-564. · Zbl 1224.62039
[13] Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Stat. 6 1847-1899. · Zbl 1295.62028
[14] Cole, R. J., Kripke, D. F., Gruen, W., Mullaney, D. J. and Gillin, J. C. (1992). Automatic sleep/wake identification from wrist activity. Sleep 15 461-469.
[15] Csiszár, I. and Talata, Z. (2006). Consistent estimation of the basic neighborhood of Markov random fields. Ann. Statist. 34 123-145. · Zbl 1102.62105
[16] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1-38. · Zbl 0364.62022
[17] Du Roy de Chaumaray, M., Marbac, M. and Navarro, F. (2020b). Supplement to “Mixture of hidden Markov models for accelerometer data.” https://doi.org/10.1214/20-AOAS1375SUPP
[18] Du Roy de Chaumaray, M., Marbac, M. and Navarro, F. (2020a). MHMM: Finite mixture of hidden Markov model. R package version 1.0.0.
[19] Dyrstad, S. M., Hansen, B. H., Holme, I. M. and Anderssen, S. A. (2014). Comparison of self-reported versus accelerometer-measured physical activity. Med. Sci. Sports Exerc. 46 99-106.
[20] Freedson, P. S., Melanson, E. and Sirard, J. (1998). Calibration of the computer science and applications. Inc. Accelerometer. Med. Sci. Sports Exerc. 30 777-781.
[21] Gassiat, E. (2002). Likelihood ratio inequalities with applications to various mixtures. Ann. Henri Poincaré (B) 38 897-906. · Zbl 1011.62025
[22] Gassiat, E., Cleynen, A. and Robin, S. (2016). Inference in finite state space non parametric hidden Markov models and applications. Stat. Comput. 26 61-71. · Zbl 1342.62141
[23] Geraci, M. (2019). Additive quantile regression for clustered data with an application to children’s physical activity. J. R. Stat. Soc. Ser. C. Appl. Stat. 68 1071-1089.
[24] Geraci, M. and Farcomeni, A. (2016). Probabilistic principal component analysis to identify profiles of physical activity behaviours in the presence of non-ignorable missing data. J. R. Stat. Soc. Ser. C. Appl. Stat. 65 51-75.
[25] Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61 215-231. · Zbl 0281.62057
[26] Grandner, M. A., Sands-Lincoln, M. R., Pak, V. M. and Garland, S. N. (2013). Sleep duration, cardiovascular disease, and proinflammatory biomarkers. Nat. Sci. Sleep 5 93-107.
[27] Gruen, M. E., Alfaro-Córdoba, M., Thomson, A. E., Worth, A. C., Staicu, A.-M. and Lascelles, B. D. X. (2017). The use of functional data analysis to evaluate activity in a spontaneous model of degenerative joint disease associated pain in cats. PLoS ONE 12 e0169576.
[28] Helske, S. and Helske, J. (2019). Mixture hidden Markov models for sequence data: The seqHMM package in R. J. Stat. Softw. 88 1-32.
[29] Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 1090-1098. · Zbl 1041.62098
[30] Huang, Q., Cohen, D., Komarzynski, S., Li, X.-M., Innominato, P., Lévi, F. and Finkenstädt, B. (2018). Hidden Markov models for monitoring circadian rhythmicity in telemetric activity data. J. R. Soc. Interface 15 20170885.
[31] Huang, L., Bai, J., Ivanescu, A., Harris, T., Maurer, M., Green, P. and Zipunnikov, V. (2019). Multilevel matrix-variate analysis and its application to accelerometry-measured physical activity in clinical populations. J. Amer. Statist. Assoc. 114 553-564. · Zbl 1420.62454
[32] Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classification 2 193-218. · Zbl 0587.62128
[33] Hunt, L. and Jorgensen, M. (2011). Clustering mixed data. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1 352-361.
[34] Hunter, D. R., Goodreau, S. M. and Handcock, M. S. (2008). Goodness of fit of social network models. J. Amer. Statist. Assoc. 103 248-258. · Zbl 1471.62390
[35] Immerwahr, S., Wyker, B., Bartley, K. and Eisenhower, D. (2012). The physical activity and transit survey device follow-up study: Methodology report. The New York City Department of Health and Mental Hygiene.
[36] Innerd, P., Harrison, R. and Coulson, M. (2018). Using open source accelerometer analysis to assess physical activity and sedentary behaviour in overweight and obese adults. BMC Public Health 18 543.
[37] Karlis, D. and Meligkotsidou, L. (2007). Finite mixtures of multivariate Poisson distributions with application. J. Statist. Plann. Inference 137 1942-1960. · Zbl 1116.60006
[38] Kimm, S. Y., Glynn, N. W., Obarzanek, E., Kriska, A. M., Daniels, S. R., Barton, B. A. and Liu, K. (2005). Relation between the changes in physical activity and body-mass index during adolescence: A multicentre longitudinal study. Lancet 366 301-307.
[39] Kosmidis, I. and Karlis, D. (2016). Model-based clustering using copulas with applications. Stat. Comput. 26 1079-1099. · Zbl 06652996
[40] Lee, J. A. and Gill, J. (2018). Missing value imputation for physical activity data measured by accelerometer. Stat. Methods Med. Res. 27 490-506.
[41] Lee, I.-M., Shiroma, E. J., Lobelo, F., Puska, P., Blair, S. N., Katzmarzyk, P. T. et al. (2012). Effect of physical inactivity on major non-communicable diseases worldwide: An analysis of burden of disease and life expectancy. Lancet 380 219-229.
[42] Levin, D. A. and Peres, Y. (2017). Markov Chains and Mixing Times. Amer. Math. Soc., Providence, RI.
[43] Lim, Y., Oh, H.-S. and Cheung, Y. K. (2019). Functional clustering of accelerometer data via transformed input variables. J. R. Stat. Soc. Ser. C. Appl. Stat. 68 495-520.
[44] Lim, S., Wyker, B., Bartley, K. and Eisenhower, D. (2015). Measurement error of self-reported physical activity levels in New York city: Assessment and correction. Am. J. Epidemiol. 181 648-655.
[45] Maruotti, A. (2011). Mixed hidden Markov models for longitudinal data: An overview. Int. Stat. Rev. 79 427-454. · Zbl 1238.62094
[46] Matias, C., Rebafka, T. and Villers, F. (2018). A semiparametric extension of the stochastic block model for longitudinal networks. Biometrika 105 665-680. · Zbl 06991025
[47] McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley Series in Probability and Statistics: Applied Probability and Statistics. Wiley, New York. · Zbl 0963.62061
[48] McNicholas, P. D. (2017). Mixture Model-Based Classification. CRC Press, Boca Raton, FL. · Zbl 1454.62005
[49] McTiernan, A. (2008). Mechanisms linking physical activity with cancer. Nat. Rev. Cancer 8 205-211.
[50] Morris, J. S., Arroyo, C., Coull, B. A., Ryan, L. M., Herrick, R. and Gortmaker, S. L. (2006). Using wavelet-based functional mixed models to characterize population heterogeneity in accelerometer profiles: A case study. J. Amer. Statist. Assoc. 101 1352-1364. · Zbl 1171.62357
[51] Noel, S. E., Mattocks, C., Emmett, P., Riddoch, C. J., Ness, A. R. and Newby, P. (2010). Use of accelerometer data in prediction equations for capturing implausible dietary intakes in adolescents. Am. J. Clin. Nutr. 92 1436-1445.
[52] Palta, P., McMurray, R., Gouskova, N., Sotres-Alvarez, D., Davis, S., Carnethon, M., Castañeda, S., Gellman, M., Hankinson, A. L. et al. (2015). Self-reported and accelerometer-measured physical activity by body mass index in us Hispanic/latino adults: Hchs/sol. Preventive medicine reports 2 824-828.
[53] Pollak, C. P., Tryon, W. W., Nagaraja, H. and Dzwonczyk, R. (2001). How accurately does wrist actigraphy identify the states of sleep and wakefulness? Sleep 24 957-965.
[54] Sadeh, A., Sharkey, M. and Carskadon, M. A. (1994). Activity-based sleep-wake identification: An empirical test of methodological issues. Sleep 17 201-207.
[55] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005
[56] Scott, S. L., James, G. M. and Sugar, C. A. (2005). Hidden Markov models for longitudinal comparisons. J. Amer. Statist. Assoc. 100 359-369. · Zbl 1117.62421
[57] Slootmaker, S. M., Schuit, A. J., Chinapaw, M. J., Seidell, J. C. and Van Mechelen, W. (2009). Disagreement in physical activity assessed by accelerometer and self-report in subgroups of age, gender, education and weight status. Int. J. Behav. Nutr. Phys. Act. 6 17.
[58] Taheri, S., Lin, L., Austin, D., Young, T. and Mignot, E. (2004). Short sleep duration is associated with reduced leptin, elevated ghrelin, and increased body mass index. PLoS Med. 1 e62.
[59] Teicher, H. (1963). Identifiability of finite mixtures. Ann. Math. Stat. 34 1265-1269. · Zbl 0137.12704
[60] Teicher, H. (1967). Identifiability of mixtures of product measures. Ann. Math. Stat. 38 1300-1302. · Zbl 0153.47904
[61] Titsias, M. K., Holmes, C. C. and Yau, C. (2016). Statistical inference in hidden Markov models using \(k\)-segment constraints. J. Amer. Statist. Assoc. 111 200-215.
[62] Troiano, R. P., Berrigan, D., Dodd, K. W., Masse, L. C., Tilert, T. and McDowell, M. (2008). Physical activity in the United States measured by accelerometer. Med. Sci. Sports Exerc. 40 181-188.
[63] . US Department of Health and Human Services (2008). 2008 physical activity guidelines for americans: Be active, healthy, and happy! Available at http://www.health.gov/paguidelines.
[64] van Hees, V. T., Sabia, S., Anderson, K. N., Denton, S. J., Oliver, J., Catt, M., Abell, J. G., Kivimäki, M., Trenell, M. I. et al. (2015). A novel, open access method to assess sleep duration using a wrist-worn accelerometer. PLoS ONE 10 e0142533.
[65] Van de Pol, F. and Langeheine, R. (1990). Mixed Markov latent class models. Sociol. Method. 213-247.
[66] Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. J. Amer. Statist. Assoc. 13 260-269. · Zbl 0148.40501
[67] Wallace, M. L., Buysse, D. J., Germain, A., Hall, M. H. and Iyengar, S. (2018). Variable selection for skewed model-based clustering: Application to the identification of novel sleep phenotypes. J. Amer. Statist. Assoc. 113 95-110. · Zbl 1398.62347
[68] Witowski, V., Foraita, R., Pitsiladis, Y., Pigeot, I. and Wirsik, N. (2014). Using hidden Markov models to improve quantifying physical activity in accelerometer data—a simulation study. PLoS ONE 9 e114089.
[69] Wong, C. S. and Li, W. K. (2000). On a mixture autoregressive model. J. R. Stat. Soc. Ser. B. Stat. Methodol. 62 95-115. · Zbl 0941.62095
[70] Wyker, B., Bartley, K., Holder-Hayes, E., Immerwahr, S., Eisenhower, D. and Harris, T. (2013). Self-reported and accelerometer-measured physical activity: a comparison in new york city. New York (NY): New York City Department of Health and Mental Hygiene: Epi Research Report, 1-12.
[71] Xiao, L., Huang, L., Schrack, J. A., Ferrucci, L., Zipunnikov, V. and Crainiceanu, C. M. (2015). Quantifying the lifetime circadian rhythm of physical activity: A covariate-dependent functional approach. Biostatistics 16 352-367.
[72] Yang, C.-C. and Hsu, Y.-L. (2010). A review of accelerometry-based wearable motion detectors for physical activity monitoring. Sensors 10 7772-7788
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.