×

Conditional functional clustering for longitudinal data with heterogeneous nonlinear patterns. (English) Zbl 1498.62264

Summary: In studies of cognitive aging, it is crucial to distinguish subtypes of longitudinal cognition change while accounting for the effects of given covariates. The longitudinal cognition trajectories and the covariate effects can both be nonlinear with heterogeneous shapes that do not follow a simple parametric form, where flexible functional methods are preferred. However, most functional clustering methods for longitudinal data do not allow controlling for the possible functional effects of covariates. Although traditional mixture-of-experts methods can include covariates and be extended to the functional setting, using nonlinear basis functions, satisfactory parsimonious functional methods required for robust functional coefficient estimation and clustering are still lacking. In this paper we propose a novel latent class functional mixed-effects model in which we assume the covariates have fixed functional effects, and the random curves follow a mixture of Gaussian processes that facilitates a model-based conditional clustering. A transformed penalized B-spline approach is employed for parsimonious modeling and robust model estimation. We propose a new iterative-REML method to choose the penalty parameters in heterogeneous data. The new method is applied to the latest data from the Religious Orders Study and Rush Memory and Aging Project, and four novel subtypes of cognitive changes are identified.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G08 Nonparametric regression and quantile regression
62R10 Functional data analysis
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Abraham, C., Cornillon, P. A., Matzner-LØber, E. and Molinari, N. (2003). Unsupervised curve clustering using B-splines. Scand. J. Stat. 30 581-595. · Zbl 1039.91067 · doi:10.1111/1467-9469.00350
[2] BAR-JOSEPH, Z., GERBER, G., GIFFORD, D. K., JAAKKOLA, T. S. and SIMON, I. (2002). A new approach to analyzing gene expression time series data. In Proceedings of the Sixth Annual International Conference on Computational Biology. RECOMB ‘02 39-48. Association for Computing Machinery, New York, NY, USA.
[3] BENNETT, D. A., BUCHMAN, A. S., BOYLE, P. A., BARNES, L. L., WILSON, R. S. and SCHNEIDER, J. A. (2018). Religious orders study and rush memory and aging project. J. Alzheimer’s Dis. 64 161-189.
[4] BOYLE, P. A., WILSON, R. S., YU, L., BARR, A. M., HONER, W. G., SCHNEIDER, J. A. and BENNETT, D. A. (2013). Much of late life cognitive decline is not due to common neurodegenerative pathologies. Ann. Neurol. 74 478-489.
[5] BOYLE, P. A., YANG, J., YU, L., LEURGANS, S. E., CAPUANO, A. W., SCHNEIDER, J. A., WILSON, R. S. and BENNETT, D. A. (2017). Varied effects of age-related neuropathologies on the trajectory of late life cognitive decline. Brain 140 804-812.
[6] BOYLE, P. A., WANG, T., YU, L., WILSON, R. S., DAWE, R., ARFANAKIS, K., SCHNEIDER, J. A. and BENNETT, D. A. (2021). To what degree is late life cognitive decline driven by age-related neuropathologies? Brain 144 2166-2175.
[7] CHAMROUKHI, F. and NGUYEN, H. D. (2019). Model-based clustering and classification of functional data. WIREs Data Mining and Knowledge Discovery 9 e1298.
[8] Chen, H. and Wang, Y. (2011). A penalized spline approach to functional mixed effects model analysis. Biometrics 67 861-870. · Zbl 1226.62030 · doi:10.1111/j.1541-0420.2010.01524.x
[9] CHIOU, J.-M. and LI, P.-L. (2007). Functional clustering and identifying substructures of longitudinal data. J. Roy. Statist. Soc. Ser. B 69 679-699. · Zbl 07555371 · doi:10.1111/j.1467-9868.2007.00605.x
[10] COFFEY, N., HINDE, J. and HOLIAN, E. (2014). Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data. Comput. Statist. Data Anal. 71 14-29. · Zbl 1471.62045 · doi:10.1016/j.csda.2013.04.001
[11] DELAIGLE, A., HALL, P. and PHAM, T. (2019). Clustering functional data into groups by using projections. J. R. Stat. Soc. Ser. B. Stat. Methodol. 81 271-304. · Zbl 1420.62270 · doi:10.1111/rssb.12310
[12] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1-38. · Zbl 0364.62022
[13] DODGE, H. H., WANG, C.-N., CHANG, C.-C. H. and GANGULI, M. (2011). Terminal decline and practice effects in older adults without dementia. Neurology 77 722-730.
[14] DU, P. and WANG, X. (2014). Penalized likelihood functional regression. Statist. Sinica 24 1017-1041. · Zbl 1285.62042
[15] Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with \(B\)-splines and penalties. Statist. Sci. 11 89-121. · Zbl 0955.62562 · doi:10.1214/ss/1038425655
[16] FRALEY, C. and RAFTERY, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. J. Amer. Statist. Assoc. 97 611-631. · Zbl 1073.62545 · doi:10.1198/016214502760047131
[17] GARCZAREK, U. M. (2002). Classification rules in standardized partition spaces. Dissertation, Univ. Dortmund. Available at http://hdl.handle.net/2003/2789.
[18] GENOLINI, C., ECOCHARD, R., BENGHEZAL, M., DRISS, T., ANDRIEU, S. and SUBTIL, F. (2016). kmlShape: An efficient method to cluster longitudinal data (time-series) according to their shapes. PLoS ONE 11 1-24.
[19] Green, P. J. (1990). On use of the EM algorithm for penalized likelihood estimation. J. Roy. Statist. Soc. Ser. B 52 443-452. · Zbl 0706.62022
[20] GRÜN, B., SCHARL, T. and LEISCH, F. (2011). Modelling time course gene expression data with finite mixtures of linear additive models. Bioinformatics 28 222-228.
[21] GU, C. (1992). Cross-validating non-Gaussian data. J. Comput. Graph. Statist. 1 169-179.
[22] GU, C. and MA, P. (2005). Optimal smoothing in nonparametric mixed-effect models. Ann. Statist. 33 1357-1379. · Zbl 1072.62027 · doi:10.1214/009053605000000110
[23] Guo, W. (2002). Functional mixed effects models. Biometrics 58 121-128. · Zbl 1209.62072 · doi:10.1111/j.0006-341X.2002.00121.x
[24] HALL, C. B., LIPTON, R. B., SLIWINSKI, M. and STEWART, W. F. (2000). A change point model for estimating the onset of cognitive decline in preclinical Alzheimer’s disease. Stat. Med. 19 1555-1566.
[25] HEARD, N. A., HOLMES, C. C. and STEPHENS, D. A. (2006). A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: An application of Bayesian hierarchical clustering of curves. J. Amer. Statist. Assoc. 101 18-29. · Zbl 1118.62368 · doi:10.1198/016214505000000187
[26] HENDERSON, C. R. (1975). Best linear unbiased estimation and prediction under a selection model. Biometrics 31 423-447. · Zbl 0335.62048
[27] JACK, C., KNOPMAN, D., JAGUST, W., PETERSEN, R., WEINER, M., AISEN, P., SHAW, L., VEMURI, P., WISTE, H. et al. (2013). Tracking pathophysiological processes in Alzheimer’s disease: An updated hypothetical model of dynamic biomarkers. Lancet Neurol. 12 207-216.
[28] Jacques, J. and Preda, C. (2013). Funclust: A curves clustering method using functional random variables density approximation. Neurocomputing 112 164-171.
[29] JACQUES, J. and PREDA, C. (2014). Model-based clustering for multivariate functional data. Comput. Statist. Data Anal. 71 92-106. · Zbl 1471.62096 · doi:10.1016/j.csda.2012.12.004
[30] JAMES, G. M. and SILVERMAN, B. W. (2005). Functional adaptive model estimation. J. Amer. Statist. Assoc. 100 565-576. · Zbl 1117.62364 · doi:10.1198/016214504000001556
[31] JAMES, G. M. and SUGAR, C. A. (2003). Clustering for sparsely sampled functional data. J. Amer. Statist. Assoc. 98 397-408. · Zbl 1041.62052 · doi:10.1198/016214503000189
[32] KONISHI, S., ANDO, T. and IMOTO, S. (2004). Bayesian information criteria and smoothing parameter selection in radial basis function networks. Biometrika 91 27-43. · Zbl 1132.62313 · doi:10.1093/biomet/91.1.27
[33] LUAN, Y. and LI, H. (2003). Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics 19 474-482.
[34] MA, P. and ZHONG, W. (2008). Penalized clustering of large-scale functional data with multiple covariates. J. Amer. Statist. Assoc. 103 625-636. · Zbl 1469.62288 · doi:10.1198/016214508000000247
[35] MA, P., CASTILLO-DAVIS, C. I., ZHONG, W. and LIU, J. S. (2006). A data-driven clustering method for time course gene expression data. Nucleic Acids Res. 34 1261-1269.
[36] MARKESBERY, W. R. (2010). Neuropathologic alterations in mild cognitive impairment: A review. J. Alzheimer’s Dis. 19 221-228.
[37] McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley Series in Probability and Statistics: Applied Probability and Statistics. Wiley Interscience, New York. · Zbl 0963.62061 · doi:10.1002/0471721182
[38] MORRIS, J. S. and CARROLL, R. J. (2006). Wavelet-based functional mixed models. J. Roy. Statist. Soc. Ser. B 68 179-199. · Zbl 1110.62053 · doi:10.1111/j.1467-9868.2006.00539.x
[39] MURPHY, K. and MURPHY, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Adv. Data Anal. Classif. 14 293-325. · Zbl 1474.62240 · doi:10.1007/s11634-019-00373-8
[40] Peng, J. and Müller, H.-G. (2008). Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. Ann. Appl. Stat. 2 1056-1077. · Zbl 1149.62053 · doi:10.1214/08-AOAS172
[41] PINHEIRO, J. C. and BATES, D. M. (2000). Mixed-Effects Models in S and S-PLUS. Springer, New York. · Zbl 0953.62065
[42] PROUST, C., JACQMIN-GADDA, H., TAYLOR, J. M. G., GANIAYRE, J. and COMMENGES, D. (2006). A nonlinear model with latent process for cognitive evolution using multivariate longitudinal data. Biometrics 62 1014-1024. · Zbl 1116.62134 · doi:10.1111/j.1541-0420.2006.00573.x
[43] PROUST-LIMA, C., DARTIGUES, J.-F. and JACQMIN-GADDA, H. (2011). Misuse of the linear mixed model when evaluating risk factors of cognitive decline. Am. J. Epidemiol. 174 1077-1088.
[44] PROUST-LIMA, C., PHILIPPS, V. and LIQUET, B. (2017). Estimation of extended mixed models using latent classes and latent processes: The R package lcmm. J. Stat. Softw. 78 1-56.
[45] QIN, L.-X. and SELF, S. G. (2006). The clustering of regression models method with applications in gene expression data. Biometrics 62 526-533. · Zbl 1097.62134 · doi:10.1111/j.1541-0420.2005.00498.x
[46] RAMONI, M. F., SEBASTIANI, P. and KOHANE, I. S. (2002). Cluster analysis of gene expression dynamics. Proc. Natl. Acad. Sci. USA 99 9121-9126. · Zbl 1023.62110 · doi:10.1073/pnas.132656399
[47] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. Springer Series in Statistics. Springer, New York. · Zbl 1079.62006
[48] Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26 195-239. · Zbl 0536.62021 · doi:10.1137/1026034
[49] RICE, J. A. and WU, C. O. (2001). Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 57 253-259. · Zbl 1209.62061 · doi:10.1111/j.0006-341X.2001.00253.x
[50] RODRIGUEZ, A. and DUNSON, D. B. (2014). Functional clustering in nested designs: Modeling variability in reproductive epidemiology studies. Ann. Appl. Stat. 8 1416-1442. · Zbl 1303.62040 · doi:10.1214/14-AOAS751
[51] ROTHENBERG, T. J. (1971). Identification in parametric models. Econometrica 39 577-591. · Zbl 0231.62081 · doi:10.2307/1913267
[52] RUPPERT, D. (2002). Selecting the number of knots for penalized splines. J. Comput. Graph. Statist. 11 735-757. · doi:10.1198/106186002321018768
[53] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005
[54] SERBAN, N. and WASSERMAN, L. (2005). CATS: Clustering after transformation and smoothing. J. Amer. Statist. Assoc. 100 990-999. · Zbl 1117.62422 · doi:10.1198/016214504000001574
[55] SHI, M., WEISS, R. E. and TAYLOR, J. M. G. (1996). An analysis of paediatric CD4 counts for acquired immune deficiency syndrome using flexible random curves. Appl. Stat. 45 151-163. · Zbl 0875.62574
[56] SILVERMAN, B. W. (1985). Some aspects of the spline smoothing approach to nonparametric regression curve fitting. J. Roy. Statist. Soc. Ser. B 47 1-52. · Zbl 0606.62038
[57] STEINERMAN, J. R., HALL, C. B., SLIWINSKI, M. J. and LIPTON, R. B. (2010). Modeling cognitive trajectories within longitudinal studies: A focus on older adults. J. Amer. Geriatr. Soc. 58 S313-S318.
[58] STERN, Y., BARNES, C. A., GRADY, C., JONES, R. N. and RAZ, N. (2019). Brain reserve, cognitive reserve, compensation, and maintenance: Operationalization, validity, and mechanisms of cognitive resilience. Neurobiol. Aging 83 124-129.
[59] TARPEY, T. (2007). Linear transformations and the \(k\)-means clustering algorithm: Applications to clustering curves. Amer. Statist. 61 34-40. · doi:10.1198/000313007X171016
[60] TITTERINGTON, D. M., SMITH, A. F. M. and MAKOV, U. E. (1985). Statistical Analysis of Finite Mixture Distributions. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, Chichester. · Zbl 0646.62013
[61] VERBEKE, G. and LESAFFRE, E. (1996). A linear mixed-effects model with heterogeneity in the random-effects population. J. Amer. Statist. Assoc. 91 217-221. · Zbl 0870.62057
[62] WAHBA, G. (1978). Improper priors, spline smoothing and the problem of guarding against model errors in regression. J. Roy. Statist. Soc. Ser. B 40 364-372. · Zbl 0407.62048
[63] WAHBA, G. (1983). Bayesian “confidence intervals” for the cross-validated smoothing spline. J. Roy. Statist. Soc. Ser. B 45 133-150. · Zbl 0538.65006
[64] WAHBA, G. (1985). A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Ann. Statist. 13 1378-1402. · Zbl 0596.65004 · doi:10.1214/aos/1176349743
[65] Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59. SIAM, Philadelphia, PA. · Zbl 0813.62001 · doi:10.1137/1.9781611970128
[66] WAKEFIELD, J. C., ZHOU, C. and SELF, S. G. (2003). Modelling gene expression data over time: Curve clustering with informative prior distributions. In Bayesian Statistics, 7 (Tenerife, 2002) (J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith and M. West, eds.) 721-732. Oxford Univ. Press, New York.
[67] WANG, Y. (1998). Smoothing spline models with correlated random errors. J. Amer. Statist. Assoc. 93 341-348. · Zbl 1068.62512
[68] WANG, T., LEI, Y., LEURGANS, S. E., WILSON, R. S., BENNETT, D. A. and BOYLE, P. A. (2022). Supplement to “Conditional functional clustering for longitudinal data with heterogeneous nonlinear patterns.” https://doi.org/10.1214/21-AOAS1542SUPP
[69] WILSON, R., BECKETT, L., BARNES, L., SCHNEIDER, J., BACH, J., EVANS, D. and BENNETT, D. (2002). Individual differences in rates of change in cognitive abilities of older persons. Psychology and Aging 17 179-193.
[70] WOOD, S. N. (2004). Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Assoc. 99 673-686. · Zbl 1117.62445 · doi:10.1198/016214504000000980
[71] YAO, F., FU, Y. and LEE, T. C. M. (2010). Functional mixture regression. Biostatistics 12 341-353. · Zbl 1437.62665
[72] ZHU, X. and QU, A. (2018). Cluster analysis of longitudinal profiles with subgroups. Electron. J. Stat. 12 171-193 · Zbl 1393.62032 · doi:10.1214/17-EJS1389
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.