Scalar-on-function regression for predicting distal outcomes from intensively gathered longitudinal data: interpretability for applied scientists. (English) Zbl 1434.62185

Summary: Researchers are sometimes interested in predicting a distal or external outcome (such as smoking cessation at follow-up) from the trajectory of an intensively recorded longitudinal variable (such as urge to smoke). This can be done in a semiparametric way via scalar-on-function regression. However, the resulting fitted coefficient regression function requires special care for correct interpretation, as it represents the joint relationship of time points to the outcome, rather than a marginal or cross-sectional relationship. We provide practical guidelines, based on experience with scientific applications, for helping practitioners interpret their results and illustrate these ideas using data from a smoking cessation study.


62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62G08 Nonparametric regression and quantile regression
62H25 Factor analysis and principal components; correspondence analysis
62P10 Applications of statistics to biology and medical sciences; meta analysis
62H11 Directional data; spatial statistics
Full Text: DOI Euclid


[1] Andersen, S. L. and Teicher, M. H. (2008). Stress, sensitive periods and maturational events in adolescent depression. Trends in Neurosciences, 31, 183-191.
[2] Andersen, S. L., Tomada, A., Vincow, E. S., Valente, E., Polcari, A., and Teicher, M. H. (2008). Preliminary evidence for sensitive periods in the effect of childhood sexual abuse on regional brain development. Journal of Neuropsychiatry and Clinical Neurosciences, 20, 292-301.
[3] Ash, R. B. and Gardner, M. F. (1975). Topics in Stochastic Processes. New York: Academic Press. · Zbl 0317.60014
[4] Ben-Zeev, D., Scherer, E. A., Wang, R., Xie, H., and Campbell, A. T. (2015). Next-generation psychiatric assessment: Using smartphone sensors to monitor behavior and mental health. Psychiatric Rehabilitation Journal, 38, 218-226.
[5] Borland, R., Yong, H.-H., O’Connor, R. J., Hyland, A., and Thompson, M. E. (2010). The reliability and predictive validity of the Heaviness of Smoking Index and its two components: Findings from the International Tobacco Control Four Country study. Nicotine & Tobacco Research, 12, S45-S50.
[6] Braveman, P., Acker, J., Arkin, E., Bussel, J., Wehr, K., and Proctor, D. (2018). Early Childhood Is Critical to Health Equity. Princeton, NJ: Robert Wood Johnson Foundation.
[7] Cai, T. T. and Yuan, M. (2012). Minimax and adaptive prediction for functional linear regression. Journal of the American Statistican Association, 107, 1201-1216. · Zbl 1443.62196 · doi:10.1080/01621459.2012.716337
[8] Cardot, H., Ferraty, F., Mas, A., and Sarda, P. (2003). Testing hypotheses in the functional linear model. Scandinavian Journal of Statistics, 30, 241-225. · Zbl 1034.62037 · doi:10.1111/1467-9469.00329
[9] Chow, S.-M., Witkiewitz, K., Grasman, R. P. P. P., and Maisto, S. A. (2015). The cusp catastrophe model as cross-sectional and longitudinal mixture structural equation models. Psychological Methods, 20, 142-164.
[10] Cofta-Woerpel, L., McClure, J. B., Li, Y., Urbauer, D., Cinciripini, P. M., and Wetter, D. W. (2011). Early cessation success or failure among women attempting to quit smoking: Trajectories and volatility of urge and negative mood during the first postcessation week. Journal of Abnormal Psychology, 120, 596-606.
[11] Compton, W. M., Jones, C. M., Baldwin, G. T., Harding, F. M., Blanco, C., and Wargo, E. M. (2019). Targeting youth to prevent later substance use disorder: An underutilized response to the US opioid crisis. AJPH Perspectives, 109(S3), S185-S189.
[12] Crainiceanu, C., Reiss, P., Goldsmith, J., Huang, L., Huo, L., Scheipl, F. (2014). refund: Regression with Functional Data. R package version 0.1-11. Accessed at cran.r-project.org.
[13] Dziak, J. J., Li, R., Tan, X., Shiffman, S., and Shiyko, M. P. (2015). Modeling intensive longitudinal data with mixtures of nonparametric trajectories and time-varying effects. Psychological Methods, 20, 444-469.
[14] Dziak, J. J. and Shiyko, M. P. (2016). funreg: Functional Regression for Irregularly Timed Data. R package version 1.2. Accessed at http://CRAN.R-project.org/package=funreg.
[15] Eilers, P. H. C., and Marx, B. D. (1996). Flexible smoothing with B-splines and penalties (with comments and rejoinder). Statistical Science, 11, 89-121. · Zbl 0955.62562 · doi:10.1214/ss/1038425655
[16] Escabias, M., Aguilera, A. M., and Vanderrama, M. J. (2004). Principal component estimation of functional logistic regression: Discussion of two different approaches. Nonparametric Statistics, 16, 365-384. · Zbl 1065.62114 · doi:10.1080/10485250310001624738
[17] Fish, J. N., Rice, C. E., Lanza, S. T., and Russell, S. T. (2018). Is young adulthood a critical period for suicidal behavior among sexual minorities? Results from a US national sample. Prevention Science, in press.
[18] GBD 2015 Tobacco Collaborators (2017). Smoking prevalence and attributable disease burden in 195 countries and territories, 1990-2015: a systematic analysis from the Global Burden of Disease Study 2015. Lancet, 389, 1885-1906.
[19] Goldsmith, J., Bobb, J., Crainiceanu, C. M., Caffo, B., and Reich, D. (2011a). Penalized functional regression. Journal of Computational and Graphical Statistics, 20, 830-851.
[20] Goldsmith, J., Crainiceanu, C. M., Caffo, B. S., and Reich, D. S. (2011b). Penalized functional regression analysis of white-matter tract profiles in multiple sclerosis. Neuroimage, 57, 431-439.
[21] Goldsmith, J., Huang, L., and Crainiceanu, C. M. (2014). Smooth scalar-on-image regression via spatial Bayesian variable selection. Journal of Computational and Graphical Statistics, 23, 46-64.
[22] Goldsmith, J., Scheipl, F., Huang, L., Wrobel, J., Gellar, J., Harezlak, J., McLean, M. W., Swihart, B., Xiao, L., Crainiceanu, C. M., and Reiss, P. T. (2016). refund: Regression with Functional Data. R package version 0.1-16. Accessed http://CRAN.R-project.org/package=refund.
[23] Hastie, T., and Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society, Series B, 55, 757-796. · Zbl 0796.62060 · doi:10.1111/j.2517-6161.1993.tb01939.x
[24] Heatherton, T. F., Kozlowski, L. T., Frecker, R. C., Rickert, W., and Robinson, J. (1989). Measuring the heaviness of smoking: using self-reported time to the first cigarette of the day and number of cigarettes smoked per day. British Journal of Addiction, 84, 791-800.
[25] Hedström, A. K., Olsson, T., Alfredsson, L. (2016). Smoking is a major preventable risk factor for multiple sclerosis. Multiple Sclerosis, 22, 1021-1026.
[26] Heinonen, K., Räikkönen, K., Pesonen, A.-K., Kajantie E., Andersson, S., Eriksson, J. G., Niemelä, A., Vartia, T., Peltola, J., and Lano, A. (2008). Prenatal and postnatal growth and cognitive abilities at 56 months of age: A longitudinal study of infants born at term. Pediatrics, 121, e1325-e1333.
[27] Hendricks, P. S., Ditre, J. W., Drobes, D. J. and Brandon, T. H. (2006). The early time course of smoking withdrawal effects. Psychopharmacology, 187, 385-396.
[28] Hicks, J. L., Althoff, T., Sosic, R., Kuhar, P., Bostjancic, B., King, A. C., Leskovec, J., and Delp, S. L. (2019). Best practices for analyzing large-scale health data from wearables and smartphone apps. npj Digital Medicine, 2, article 45.
[29] Ivanescu, A. E., Crainiceanu, C. M., and Checkley, W. (2017). Dynamic child growth prediction: A comparative methods approach. Statistical Modelling, 17(6), 468-493. · Zbl 07289493
[30] James, G. (2002). Generalized linear models with functional predictor variables. Journal of the Royal Statistical Society, Series B, 64, 411-432. · Zbl 1090.62070 · doi:10.1111/1467-9868.00342
[31] James, G. M., and Hastie, T. J. (2001). Functional linear discriminant analysis for irregularly sampled curves. Journal of the Royal Statistical Society, Series B, 63, 533-550. · Zbl 0989.62036 · doi:10.1111/1467-9868.00297
[32] James, G. M., Wang, J., and Zhu, J. (2009). Functional linear regression that’s interpretable. Annals of Statistics, 37, 2083-2108. · Zbl 1171.62041 · doi:10.1214/08-AOS641
[33] Kalkhoran, S., Benowitz, N. L., Rigotti, N. A. (2018). Prevention and treatment of tobacco use: JACC health promotion series. Journal of the American College of Cardiology, 72, 1030-1045.
[34] Kamarck, T. W., Muldoon, M. F., Shiffman, S. and Sutton-Tyrrell, K. (2007). Experiences of demand and control during daily life are predictors of carotid atherosclerotic progression among healthy men. Health Psychology, 26, 324-332.
[35] Kaye, A. P., Kwan, A. C., Ressler, K. J., and Krystal, J. H. (2019). A computational model for learning from repeated trauma. bioRxiv, https://doi.org/10.1101/659425.
[36] Khoury, J., Gonzalez, A., Levitan, R. D., Pruessner, J. C., Chopra, K., Santo Basile, V., Masellis, M., Goodwill, A., and Atkinson, L. (2015). Summary cortisol reactivity indicators: Interrelations and meaning. Neurobiology of Stress, 2, 34-43.
[37] Knudsen, E. I. (2004). Sensitive periods in the development of the brain and behavior. Journal of Cognitive Neuroscience, 16, 1412-1425.
[38] Kong, D., Staicu, A.-M., and Maity, A. (2016). Classical testing in functional linear models. Journal of Nonparametric Statistics, 28, 813-838. · Zbl 1348.62136 · doi:10.1080/10485252.2016.1231806
[39] Kozlowski, L. T., Porter, C. Q., Orleans, C. T., Pope, M. A., and Heatherton, T. (1994). Predicting smoking cessation with self-reported measures of nicotine dependence: FTQ, FTND, and HSI. Drug and Alcohol Dependence, 34, 211-216.
[40] Kuehl, R. O. (2000). Design of Experiments: Statistical Principles of Research Design and Analysis (2nd ed.). Pacific Grove, CA: Duxbury Thomson. · Zbl 0862.62004
[41] Laber, E. B., and Staicu, A.-M. (2017). Functional feature construction for individualized treatment regimes. Journal of the American Statistical Association, in press. · Zbl 1402.62276 · doi:10.1080/01621459.2017.1321545
[42] Liang, K.-Y., and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13-22. · Zbl 0595.62110 · doi:10.1093/biomet/73.1.13
[43] Lindor, K. D., Gershwin, M. E., Poupon, R., Kaplan, M., Bergasa, N. V., Heathcote, E. J. (2009). Primary biliary cirrhosis. Hepatology, 2009, 291-308.
[44] Lindquist, M. A., and McKeague, I. W. (2009). Logistic regression with Brownian-like predictors. Journal of the American Statistical Association, 104, 1575-1585. · Zbl 1205.62125 · doi:10.1198/jasa.2009.tm08496
[45] Lupien, S. J., McEwen, B. S., Gunnar, M. R., and Heim, C. (2009). Effects of stress throughout the lifespan on the brain, behaviour and cognition. Nature Reviews Neuroscience, 10, 434-445.
[46] N. Maruyama, F. Takahashi, and M. Takeuchi (2009). Prediction of an outcome using trajectories estimated from a linear mixed model. Journal of Biopharmaceutical Statistics, 19, 779-790.
[47] McCarthy, D. E., Piasecki, T. M., Fiore, M. C., and Baker, T. B. (2006). Life before and after quitting smoking: an electronic diary study. Journal of Abnormal Psychology, 115, 454-466.
[48] McCullagh, P. and Nelder, J. (1989). Generalized Linear Models (2nd ed.). Boca Raton: Chapman and Hall/CRC. · Zbl 0744.62098
[49] McVicar, D., Moschion, J., and Ours, J. C. (2019). Early illicit drug use and the age of onset of homelessness. Journal of the Royal Statistical Society, A, 182, 345-372.
[50] Müller, H.-G. and Stadtmüller, U. (2005). Generalized functional linear models. Annals of Statistics, 33, 774-805. · Zbl 1068.62048
[51] National Academies of Sciences, Engineering, and Medicine (2019). Vibrant and Healthy Kids: Aligning Science, Practice, and Policy to Advance Health Equity. Washington, DC: The National Academies Press.
[52] National Institute on Drug Abuse (2003). Preventing Drug Use Among Children and Adolescents: A Research-Based Guide for Parents, Educators, and Community Leaders (2nd ed.). Available online at https://www.drugabuse.gov/sites/default/files/preventingdruguse_2.pdf.
[53] Neely, K. A., Planetta, P. J., Prodoehl, J., Corcos, D. M., Comella, C. L., Goetz, C. G., Shannon, K. L., and Vaillancourt, D. E. (2013). Force control deficits in individuals with Parkinson’s disease, multiple systems atrophy, and progressive supranuclear palsy. PLOS ONE, 8, e58403.
[54] Nguyen, H., and Loughran, T. A. (2018). On the measurement and identification of turning points in criminology. Annual Review of Criminology, 1, 335-358.
[55] Njagi, E. J., Rizopoulos, D., Molenberghs, G., Dendale, P., and Willekens, K. (2013). A joint survival-longitudinal modelling approach for the dynamic prediction of rehospitalization in telemonitored chronic heart failure patients. Statistical Modeling, 13, 179-198. · Zbl 07257454
[56] Orben, A., and Przybylski, A. K. (2019). The association between adolescent well-being and digital technology use. Nature Human Behavior, https://doi.org/10.1038/s41562-018-0506-1.
[57] Pechtel, P., Lyons-Ruth, K., Anderson, C. M., and Teicher, M. H. (2014). Sensitive periods of amygdala development: The role of maltreatment in preadolescence. Neuroimage, 97, 236-244.
[58] Piasecki, T. M., Niaura, R., Shadel, W. G., Abrams, D., Goldstein, M., Fiore, M. C., Baker, T. B. (2000). Smoking withdrawal dynamics in unaided quitters. Journal of Abnormal Psychology, 109, 74-86.
[59] Piasecki, T. M. (2006). Relapse to smoking. Clinical Psychology Review, 26, 196-215.
[60] R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Accessed at http://www.R-project.org/.
[61] Rabe-Hesketh, S., and Skrondal, A. (2008). Generalized linear mixed-effects models. In Fitzmaurice, G., Davidian, M., Verbeke, G., and Molenberghs, G., Longitudinal Data Analysis, pp. 79-106. Boca Raton: Chapman & Hall.
[62] Ramsay, J. O. and Dalzell, C. J. (1991). Some tools for functional data analysis. Journal of the Royal Statistical Society, Series B, 53, 539-572. · Zbl 0800.62314 · doi:10.1111/j.2517-6161.1991.tb01844.x
[63] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis (2nd ed.). Springer: New York. · Zbl 1079.62006
[64] Ramsay, J. O., Wickham, H., Graves, S., and Hooker, G. (2014). fda: Functional Data Analysis. R package version 2.4.4. Accessed at http://CRAN.R-project.org/package=fda.
[65] Ramsey, F., and Schafer, D. (2013). The Statistical Sleuth: A Course in Methods of Data Analysis (3nd ed.). Boston: Brooks/Cole. · Zbl 1329.62005
[66] Ratcliffe, S. J., Heller, G. Z. and Leader, L. R. (2002). Functional data analysis with application to periodically stimulated foetal heart rate data. II: Functional logistic regression. Statistics in Medicine, 21, 1115-1127.
[67] Reiss, P. T., Goldsmith, J., Shang, H. L., and Ogden, R. T. (2017). Methods for scalar-on-function regression. International Statistical Review, 85, 228-249. · Zbl 07763546
[68] Rizopoulos, D. (2011). Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics, 67, 819-829. · Zbl 1226.62124 · doi:10.1111/j.1541-0420.2010.01546.x
[69] Robinson, G. K. (1991). That BLUP is a good thing: The estimation of random effects. Statistical Science, 6, 15-51. · Zbl 0955.62500 · doi:10.1214/ss/1177011926
[70] Roque, N. A. and Ram, N. (2019). tsfeaturex: An R package for automating time series feature extraction. Journal of Open Source Software, https://doi.org/10.21105/joss.01279.
[71] Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric Regression. Cambridge: Cambridge University Press. · Zbl 1038.62042
[72] Sang, P., Wang, L., & Cao, J. (2018). Estimation of sparse functional additive models with adaptive group LASSO. Statistica Sinica, in press, http://www3.stat.sinica.edu.tw/ss_newpaper/SS-2017-0491_na.pdf. · Zbl 1456.62150
[73] Shapiro, J. M., Smith, H., and Schaffner, F. (1979). Serum bilirubin: a prognostic factor in primary biliary cirrhosis. Gut, 20, 138-140.
[74] Shiffman, S., Gwaltney, C. J., Balabanis, M. H., Liu, K. S., Paty, J. A., Kassel, J. D., Hickcox, M., and Gnys, M. (2002). Immediate antecedents of cigarette smoking: An analysis from ecological momentary assessment. Journal of Abnormal Psychology, 111, 531-545.
[75] Shiffman, S. (2007). Use of more nicotine lozenges leads to better success in quitting smoking. Addiction, 102, 809-814.
[76] Shiffman, S. (2009). Ecological momentary assessment (EMA) in studies of substance use. Psychological Assessment, 21, 486-497.
[77] Shiffman, S., Engberg, J. B., Paty, J. A., Perz, W. G., Gnys, M., Kassel, J. D., and Hickcox, M. (1997). A day at a time: predicting smoking lapse from daily urge. Journal of Abnormal Psychology, 106, 104-116.
[78] Shiffman, S., Hickcox, M., Paty, J. A., Gnys, M., Kassel, J. D., and Richards, T. J. (1996). Progression from a smoking lapse to relapse: prediction from abstinence violation effects, nicotine dependence, and lapse characteristics. Journal of Consulting and Clinical Psychology, 64, 993-1002.
[79] Shiffman, S., Paty, J. A., Gnys, M., Kassel, J. A., and Hickcox, M. (1996). First lapses to smoking: Within-subjects analysis of real-time reports. Journal of Consulting and Clinical Psychology, 64, 366-379.
[80] Singh, R., Quinn, J. D., Reed, P. M., Keller, K. (2018). Skill (or lack thereof) of data-model fusion techniques to provide an early warning signal for an approaching tipping point. PLoS ONE, 13, e0191768.
[81] Sørensen, H., Goldsmith, J., and Sangalli, L. M. (2013). An introduction with medical applications to functional data analysis. Statistics in Medicine, 32, 5222-5240.
[82] Shiyko, M. and Lanza, S. T., and Tan, X. and Li, R. and Shiffman, S. (2011). Using the time-varying effect model (TVEM) to examine dynamic associations between negative affect and self confidence on smoking urges: Differences between successful quitters and relapsers. Prevention Science, 13, 288-299.
[83] Steidtmann, D., Manber, R., Blasey, C., Markowitz, J. C., Klein, D. N., Rothbaum, B. O., Thase, M. E., Kocsis, J. H., & Arnow, B. A. (2013). Detecting critical decision points in psychotherapy and psychotherapy \(+\) medication for chronic depression. Journal of Consulting and Clinical Psychology, 81, 783-792.
[84] Stone, A. A. and Shiffman, S. (1994). Ecological momentary assessment (EMA) in behavorial medicine. Annals of Behavioral Medicine, 16, 199-202.
[85] Tan, X., Shiyko, M. P., Li, R., Li, Y., and Dierker, L. (2012). A time-varying effect model for intensive longitudinal data. Psychological Methods, 17, 61-77.
[86] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288. · Zbl 0850.62538 · doi:10.1111/j.2517-6161.1996.tb02080.x
[87] Trail, J. B., Collins, L. M., Rivera, D. E., Li, R., Piper, M. E., and Baker, T. B. (2014). Functional data analysis for dynamical system identification of behavioral processes. Psychological Methods, 19, 175-187.
[88] van Houwelingen, H. C. (2006). Dynamic prediction by landmarking in event history analysis. Scandinavian Journal of Statistics, 34, 70-85. · Zbl 1142.62083 · doi:10.1111/j.1467-9469.2006.00529.x
[89] van Zundert, R. M. P., Boogerd, E. A., Vermulst, A. A., and Engels, R. C. (2009). Nicotine withdrawal symptoms following a quit attempt: An ecological momentary assessment study among adolescents. Nicotine and Tobacco Research, 11, 722-729.
[90] Vinci, C., Li, L., Wu, C., Lam, C. Y., Guo, L., Correa-Fernández, V., Spears, C. A., Hoover, D. S., Etcheverry, P. E., and Wetter, D. W. (2017). The association of positive emotion and first smoking lapse: An ecological momentary assessment study. Health Psychology, 36, 1038-1046.
[91] Walls, T. A., and Schafer, J. L. (2006). Models for Intensive Longitudinal Data. Oxford: Oxford University Press. Wand, M. (2013). · Zbl 1103.62306
[92] Wang, J.-L., Chiou, J.-M., and Müller, H.-G. (2016). Review of functional data analysis. Annual Review of Statistics and its Application, 3, 257-295.
[93] Wood, S.N. (2017). Generalized Additive Models: An Introduction with R (2nd ed.). New York: Chapman and Hall/CRC. · Zbl 1368.62004
[94] Worley, M. J.. Heinzerling, K. G., Shoptaw, S., and Ling, W. (2015). Pain volatility and prescription opioid addiction treatment outcomes in patients with chronic pain. Experimental and Clinical Psychopharmacology, 23(6), 428-435.
[95] Wrobel, D., Zipunnikov, V., Schrack, J., and Goldsmith, J. (2018). Registration for exponential family functional data. Biometrics, in press. · Zbl 1436.62651
[96] Yen, J. D. L., Thomson, J. R., Paganin, D. M., Keith, J. M., and MacNally, R. (2015). Function regression in ecology and evolution: FREE. Methods in Ecology and Evolution, 6, 17-26.
[97] Yuen, H. P., and Mackinnon, A. (2016). Performance of joint modelling of time-to-event data with time-dependent predictors: an assessment based on transition to psychosis data. PeerJ, 4, e2582. eCollection 2016.
[98] Zhang, Y., Zhou, J., Niu, F., Donowitz, J. R., Haque, R., Petri, W. A. Jr., and Ma, J. Z. (2017). Characterizing early child growth patterns of height-for-age in an urban slum cohort of Bangladesh with functional principal component analysis. BMC Pediatrics, 17, 84.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.