Liu, Lei; Shih, Ya-Chen Tina; Strawderman, Robert L.; Zhang, Daowen; Johnson, Bankole A.; Chai, Haitao Statistical analysis of zero-inflated nonnegative continuous data: a review. (English) Zbl 1420.62457 Stat. Sci. 34, No. 2, 253-279 (2019). Summary: Zero-inflated nonnegative continuous (or semicontinuous) data arise frequently in biomedical, economical, and ecological studies. Examples include substance abuse, medical costs, medical care utilization, biomarkers (e.g., CD4 cell counts, coronary artery calcium scores), single cell gene expression rates, and (relative) abundance of microbiome. Such data are often characterized by the presence of a large portion of zero values and positive continuous values that are skewed to the right and heteroscedastic. Both of these features suggest that no simple parametric distribution may be suitable for modeling such type of outcomes. In this paper, we review statistical methods for analyzing zero-inflated nonnegative outcome data. We will start with the cross-sectional setting, discussing ways to separate zero and positive values and introducing flexible models to characterize right skewness and heteroscedasticity in the positive values. We will then present models of correlated zero-inflated nonnegative continuous data, using random effects to tackle the correlation on repeated measures from the same subject and that across different parts of the model. We will also discuss expansion to related topics, for example, zero-inflated count and survival data, nonlinear covariate effects, and joint models of longitudinal zero-inflated nonnegative continuous data and survival. Finally, we will present applications to three real datasets (i.e., microbiome, medical costs, and alcohol drinking) to illustrate these methods. Example code will be provided to facilitate applications of these methods. Cited in 15 Documents MSC: 62P10 Applications of statistics to biology and medical sciences; meta analysis 62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH) 62J02 General nonlinear regression Keywords:two-part model; tobit model; health econometrics; semiparametric regression; joint model; cure rate; frailty model; splines Software:MAST; GAMLSS; ISLR; timereg; SASmixed × Cite Format Result Cite Review PDF Full Text: DOI Euclid References: [1] Aitchison, J. (1955). On the distribution of a positive random variable having a discrete probability mass at the origin. J. Amer. Statist. Assoc.50 901-908. · Zbl 0065.12604 [2] Albert, P. S. (2005). Letter to the editor. Biometrics61 879-881. [3] Amemiya, T. (1994). Introduction to Statistics and Econometrics. Harvard Univ. Press, Boston, MA. [4] Bang, H. and Tsiatis, A. A. (2002). Median regression with censored cost data. Biometrics58 643-649. · Zbl 1210.62041 · doi:10.1111/j.0006-341X.2002.00643.x [5] Basu, A. and Manning, W. G. (2006). A test for proportional hazards assumption within the exponential conditional mean framework. Health Serv. Outcomes Res. Methodol.6 81-100. [6] Basu, A., Manning, W. G. and Mullahy, J. (2004). Comparing alternative models: Log vs Cox proportional hazard? Health Econ.13 749-765. [7] Basu, A. and Rathous, P. J. (2005). Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics6 93-109. · Zbl 1069.62085 · doi:10.1093/biostatistics/kxh020 [8] Berk, K. N. and Lachenbruch, P. A. (2002). Repeated measures with zeros. Stat. Methods Med. Res.11 303-316. · Zbl 1121.62574 · doi:10.1191/0962280202sm293ra [9] Bjerre, B., Marques, P., Selen, J. and Thorsson, U. (2007). Swedish alcohol ignition interlock programme for drink-drivers: Effects on hospital care utilization and sick leave. Addiction102 560-570. [10] Blough, D. K., Madden, C. W. and Hornbrook, M. C. (1999). Modeling risk using generalized linear models. J. Health Econ.18 153-171. [11] Boag, J. W. (1949). Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J. Roy. Statist. Soc.11 15-53. · Zbl 0034.08001 · doi:10.1111/j.2517-6161.1949.tb00020.x [12] Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations. (With discussion.) J. R. Stat. Soc. Ser. B. Stat. Methodol.26 211-252. · Zbl 0156.40104 · doi:10.1111/j.2517-6161.1964.tb00553.x [13] Breton, C. V., Kile, M. L., Catalano, P. J., Hoffman, E., Quamruzzaman, Q., Rahman, M., Mahiuddin, G. and Christiani, D. C. (2007). GSTM1 and APE1 genotypes affect arsenic-induced oxidative stress: A repeated measures study. Environ. Health6 39. [14] Chai, H. S. and Bailey, K. R. (2008). Use of log-skew-normal distribution in analysis of continuous data with a discrete component at zero. Stat. Med.27 3643-3655. · doi:10.1002/sim.3210 [15] Chai, H., Jiang, H., Lin, L. and Liu, L. (2018). A marginalized two-part Beta regression model for microbiome compositional data. PLoS Comput. Biol.14 e1006329. [16] Chen, E. Z. and Li, H. (2016). A two-part mixed-effect model for analyzing longitudinal microbiome compositional data. Bioinformatics32 2611-2617. [17] Chen, J., Johnson, B. A., Wang, X. Q., O’Quigley, J., Isaac, M., Zhang, D. and Liu, L. (2012). Trajectory analyses in alcohol treatment research. Alcohol. Clin. Exp. Res.36 1442-1448. [18] Chen, J., Liu, L., Johnson, B. A. and O’Quigley, J. (2013a). Penalized likelihood estimation for semiparametric mixed models, with application to alcohol treatment research. Stat. Med.32 335-346. [19] Chen, J., Liu, L., Zhang, D. and Shih, Y.-C. T. (2013b). A flexible model for the mean and variance functions, with application to medical cost data. Stat. Med.32 4306-4318. [20] Chen, J., Liu, L., Shih, Y.-C. T., Zhang, D. and Severini, T. A. (2016). A flexible model for correlated medical costs, with application to medical expenditure panel survey data. Stat. Med.35 883-894. [21] Cooper, N. J., Lambert, P. C., Abrams, K. R. and Sutton, A. J. (2007). Predicting costs over time using Bayesian Markov chain Monte Carlo methods: An application to early inflammatory polyarthritis. Health Econ.16 37-56. [22] Cotter, D., Thamer, M., Narasimhan, K., Zhang, Y. and Bullock, K. (2006). Translating epoetin research into practice: The role of government and the use of scientific evidence. Health Aff.25 1249-1259. [23] Dominici, F. and Zeger, S. L. (2005). Smooth quantile ratio estimation with regression: Estimating medical expenditures for smoking-attributable diseases. Biostatistics6 505-519. · Zbl 1169.62405 · doi:10.1093/biostatistics/kxi031 [24] Dominici, F., Cope, L., Naiman, D. Q. and Zeger, S. L. (2005). Smooth quantile ratio estimation. Biometrika92 543-557. · Zbl 1183.62056 · doi:10.1093/biomet/92.3.543 [25] Dow, W. H. and Norton, E. C. (2003). Choosing between and interpreting the heckit and two-part models for corner solutions. Health Serv. Outcomes Res. Methodol.4 5-18. [26] Duan, N. (1983). Smearing estimate: A nonparametric retransformation method. J. Amer. Statist. Assoc.78 605-610. · Zbl 0534.62021 · doi:10.1080/01621459.1983.10478017 [27] Duan, N., Manning, W. G., Morris, C. and Newhouse, J. P. (1983). A comparison of alternative models for the demand for medical care. J. Bus. Econom. Statist.1 115-126. [28] Dudley, R. A., Harrell, F. E. Jr, Smith, L. R., Mark, D. B., Califf, R. M., Pryor, D. B., Glower, D., Lipscomb, J. and Hlatky, M. (1993). Comparison of analytic models for estimating the effect of clinical factors on the cost of coronary artery bypass graft surgery. J. Clin. Epidemiol.46 261-271. [29] Falk, D., Wang, X. Q., Liu, L., Fertig, J., Mattson, M., Ryan, M., Johnson, B., Stout, R. and Litten, R. Z. (2010). Percentage of subjects with no heavy drinking days: Evaluation as an efficacy endpoint for alcohol clinical trials. Alcohol. Clin. Exp. Res.34 2022-2034. [30] Farewell, V. T. (1982). The use of mixture models for the analysis of survival data with long-term survivors. Biometrics38 1041-1046. [31] Finak, G., McDavid, A., Yajima, M., Deng, J., Gersuk, V., Shalek, A. K., Slichter, C. K., Miller, H. W., McElrath, M. J., Prlic, M., Linsley, P. S. and Gottardo, R. (2015). MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol.16 278. [32] Food and Drug Administration (2006). Medical Review of Vivitrol 21-897. U.S. Government, Rockville, MD. [33] Gatsonis, C., Epstein, A. M., Newhouse, J. P., Normand, S. L. and McNeil, B. J. (1995). Variations in the utilization of coronary angiography for elderly patients with an acute myocaridal infaction: An analysis using hierarchical logistic regression. Med. Care33 625-642. [34] Ghosh, P. and Albert, P. S. (2009). A Bayesian analysis for longitudinal semicontinuous data with an application to an acupuncture clinical trial. Comput. Statist. Data Anal.53 699-706. · Zbl 1452.62808 · doi:10.1016/j.csda.2008.09.011 [35] Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: A case study. Biometrics56 1030-1039. · Zbl 1060.62535 · doi:10.1111/j.0006-341X.2000.01030.x [36] Hall, D. B. and Severini, T. A. (1998). Extended generalized estimating equations for clustered data. J. Amer. Statist. Assoc.93 1365-1375. · Zbl 1032.62066 · doi:10.1080/01621459.1998.10473798 [37] Han, D., Liu, L., Su, X., Johnson, B. and Sun, L. (2018). Variable selection for random effects two-part model. Stat. Methods Med. Res.DOI:10.1177/0962280218784712. [38] Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica47 153-161. · Zbl 0392.62093 · doi:10.2307/1912352 [39] Heitjan, D. F., Kim, C. Y. and Li, H. (2004). Bayesian estimation of cost-effectiveness from censored data. Stat. Med.23 1297-1309. [40] Henderson, R., Diggle, P. and Dobson, A. (2000). Joint modelling of longitudinal measurements and event time data. Biostatistics1 465-480. · Zbl 1089.62519 · doi:10.1093/biostatistics/1.4.465 [41] Hyndman, R. and Grunwald, G. (2000). Generalized additive modelling of mixed distribution Markov models with application to Melbourne’s rainfall. Aust. N. Z. J. Stat.42 145-158. [42] Jain, A. K. and Strawderman, R. L. (2002). Flexible hazard regression modeling for medical cost data. Biostatistics3 101-118. · Zbl 1133.62354 · doi:10.1093/biostatistics/3.1.101 [43] James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics103. Springer, New York. · Zbl 1281.62147 [44] Jha, A. K., Varosy, P. D., Kanaya, A. K., Hunninghake, D. B., Hlatky, M. A., Waters, D. D., Furberg, C. D. and Shlipak, M. G. (2003). Differences in medical care and disease outcomes among black and white women with heart disease. Circulation108 1089-1094. [45] Johnson, B. A., Rosenthal, N., Capece, J. A., Wiegand, F., Mao, L., Bayers, K., McKay, A., Ait-Daoud, N., Anton, R. F., Ciraulo, D. A., Kranzler, H. R., Mann, K., O’Malley, S. S. and Swift, R. M. (2007). Topiramate for treating alcohol dependence—a randomized controlled trial. J. Am. Med. Assoc.298 1641-1651. [46] Johnson, B. A., Ait-Daoud, N., Wang, X.-Q., Penberthy, J. K., Javors, M. A., Seneviratne, C. and Liu, L. (2013). Topiramate for the treatment of cocaine addiction: A randomized clinical trial. J. Am. Med. Dir. Assoc. Psychiatr.70 1338-1346. [47] Kalbfleisch, J. D. and Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data, 2nd ed. Wiley Series in Probability and Statistics. Wiley Interscience, Hoboken, NJ. · Zbl 1012.62104 [48] Kuk, A. Y. C. and Chen, C. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika79 531-541. · Zbl 0775.62300 · doi:10.1093/biomet/79.3.531 [49] Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics34 1-14. · Zbl 0850.62756 · doi:10.2307/1269547 [50] Leung, S. F. and Yu, S. (1996). On the choice between sample selection and two-part models. J. Econometrics72 197-229. · Zbl 0881.62132 · doi:10.1016/0304-4076(94)01720-4 [51] Lewis, J. D., Chen, E. Z., Baldassano, R. N., Otley, A. R., Griffiths, A. M., Lee, D., Bittinger, K., Bailey, A., Friedman, E. S., Hoffmann, C., Albenberg, L., Sinha, R., Compher, C., Gilroy, E., Nessel, L., Grant, A., Chehoud, C., Li, H., Wu, G. D. and Bushman, F. D. (2015). Inflammation, antibiotics, and diet as environmental stressors of the gut microbiome in pediatric Crohn’s disease. Cell Host Microbe18 489-500. [52] Li, P., Schneider, J. E. and Ward, M. M. (2007). Effect of critical access hospital conversion on patient safety. Health Serv. Res.42 2089-2108; discussion 2294-2323. [53] Li, C.-S. and Taylor, J. M. G. (2002). A semi-parametric accelerated failure time cure model. Stat. Med.21 3235-3247. [54] Lin, D. Y., Etzioni, R., Feuer, E. J. and Wax, Y. (1997). Estimating medical costs from incomplete follow-up data. Biometrics53 419-434. · Zbl 0881.62116 · doi:10.2307/2533947 [55] Lipscomb, J., Ancukiewicz, M., Parmigiani, G., Hasselblad, V., Samsa, G. and Matchar, D. B. (1998). Predicting the cost of illness: A comparison of alternative models applied to stroke. Med. Decis. Mak.18 S39-S56. [56] Littell, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. D. and Schabernberger, O. (2006). SAS for Mixed Model, 2nd ed. SAS Institute Inc., Cary, NC. [57] Liu, L. (2009). Joint modeling longitudinal semi-continuous data and survival, with application to longitudinal medical cost data. Stat. Med.28 972-986. [58] Liu, L. and Huang, X. (2008). The use of Gaussian quadrature for estimation in frailty proportional hazards models. Stat. Med.27 2665-2683. [59] Liu, Y. and Liu, L. (2015). Joint models for longitudinal data and time-to-event occurrence. In Routledge International Handbook of Advanced Quantitative Methods in Nursing Research (S. J. Henly, ed.) 253-263. Taylor and Francis, London. [60] Liu, L., Ma, J. Z. and Johnson, B. A. (2008). A multi-level two-part random effects model, with application to an alcohol-dependence study. Stat. Med.27 3528-3539. [61] Liu, L., Wolfe, R. A. and Huang, X. (2004). Shared frailty models for recurrent events and a terminal event. Biometrics60 747-756. · Zbl 1274.62827 · doi:10.1111/j.0006-341X.2004.00225.x [62] Liu, L., Wolfe, R. A. and Kalbfleisch, J. D. (2007). A shared random effects model for censored medical costs and mortality. Stat. Med.26 139-155. [63] Liu, L., Conaway, M. R., Knaus, W. A. and Bergin, J. D. (2008). A random effects four-part model, with application to correlated medical costs. Comput. Statist. Data Anal.52 4458-4473. · Zbl 1452.62823 · doi:10.1016/j.csda.2008.02.034 [64] Liu, L., Strawderman, R. L., Cowen, M. E. and Shih, Y. C. T. (2010). A flexible two-part random effects model for correlated medical costs. J. Health Econ.29 110-123. [65] Liu, L., Huang, X., Yaroshinsky, A. and Cormier, J. N. (2016a). Joint frailty models for zero-inflated recurrent events in the presence of a terminal event. Biometrics72 204-214. · Zbl 1393.62079 · doi:10.1111/biom.12376 [66] Liu, L., Strawderman, R. L., Johnson, B. A. and O’Quigley, J. M. (2016b). Analyzing repeated measures semi-continuous data, with application to an alcohol dependence study. Stat. Methods Med. Res.25 133-152. [67] Lu, S.-E., Lin, Y. and Shih, W.-C. J. (2004). Analyzing excessive no changes in clinical trials with clustered data. Biometrics60 257-267. · Zbl 1130.62376 · doi:10.1111/j.0006-341X.2004.00155.x [68] Mahmud, S., Lou, W. W. and Johnston, N. W. (2010). A probit- log- skew-normal mixture model for repeated measures data with excess zeros, with application to a cohort study of paediatric respiratory symptoms. BMC Med. Res. Methodol.10 55. [69] Manning, W. G. (1998). The logged dependent variable, heteroscedasticity, and the retransformation problem. J. Health Econ.17 283-295. [70] Manning, W. G., Basu, A. and Mullahy, J. (2005). Generalized modeling approaches to risk adjustment of skewed outcomes data. J. Health Econ.20 465-488. [71] Manning, W. G., Duan, N. and Rogers, W. H. (1987). Monte-Carlo evidence on the choice between sample selection and 2-part models. J. Econometrics35 59-82. [72] Manning, W. G. and Mullahy, J. (2001). Estimating log models: To transform or not to transform? J. Health Econ.20 461-494. [73] Manning, W., Morris, C., Newhouse, J. et al. (1981). A two-part model of the demand for medical care: Preliminary results from the health insurance study. In Health, Economics, and Health Economics (J. van der Gaag and M. Perlman, eds.) 103-123. North-Holland, Amsterdam. [74] Martinussen, T. and Scheike, T. H. (2006). Dynamic Regression Models for Survival Data. Statistics for Biology and Health. Springer, New York. · Zbl 1096.62119 [75] McDavid, A., Finak, G., Chattopadyay, P. K., Dominguez, M., Lamoreaux, L., Ma, S. S., Roederer, M. and Gottardo, R. (2013). Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics29 461-467. [76] Min, Y. and Agresti, A. (2005). Random effect models for repeated measures of zero-inflated count data. Stat. Model.5 1-19. · Zbl 1070.62060 · doi:10.1191/1471082X05st084oa [77] Moulton, L. and Halsey, N. (1995). A mixture model with detection limits for regression analyses of antibody response to vaccine. Biometrics51 1570-1578. · Zbl 0875.62502 · doi:10.2307/2533289 [78] Mullahy, J. (1998). Much ado about two: Reconsidering retransformation and the two-part model in health econometrics. J. Health Econ.17 247-281. [79] Neelon, B., O’Malley, A. J. and Normand, S.-L. T. (2011). A Bayesian two-part latent class model for longitudinal medical expenditure data: Assessing the impact of mental health and substance abuse parity. Biometrics67 280-289. · Zbl 1216.62040 · doi:10.1111/j.1541-0420.2010.01439.x [80] Neelon, B., O’Malley, A. J. and Smith, V. A. (2016). Modeling zero-modified count and semicontinuous data in health services research part 1: Background and overview. Stat. Med.35 5070-5093. [81] Neelon, B., Zhu, L. and Neelon, S. E. B. (2015). Bayesian two-part spatial models for semicontinuous data with application to emergency department expenditures. Biostatistics16 465-479. [82] Neelon, B., Chang, H. H., Ling, Q. and Hastings, N. S. (2016). Spatiotemporal hurdle models for zero-inflated count data: Exploring trends in emergency department visits. Stat. Methods Med. Res.25 2558-2576. [83] Olsen, M. K. and Schafer, J. L. (2001). A two-part random-effects model for semicontinuous longitudinal data. J. Amer. Statist. Assoc.96 730-745. · Zbl 1017.62064 · doi:10.1198/016214501753168389 [84] Othus, M., Barlogie, B., LeBlanc, M. L. and Crowley, J. J. (2012). Cure models as a useful statistical tool for analyzing survival. Clin. Cancer Res.18 3731-3736. [85] Park, R. E. (1966). Estimation with heteroscedastic error terms. Econometrica34 888. [86] Peng, Y. (2000). A nonparametric mixture model for cure rate estimation. Biometrics56 237-243. · Zbl 1060.62651 · doi:10.1111/j.0006-341X.2000.00237.x [87] Peng, Y. (2003). Fitting semiparametric cure models. Comput. Statist. Data Anal.41 481-490. · Zbl 1429.62588 · doi:10.1016/S0167-9473(02)00184-6 [88] Peng, Y., Taylor, J. M. G. and Yu, B. (2007). A marginal regression model for multivariate failure time data with a surviving fraction. Lifetime Data Anal.13 351-369. · Zbl 1331.62434 · doi:10.1007/s10985-007-9042-4 [89] Pullenayegum, E. M. and Willan, A. R. (2007). Semi-parametric regression models for cost-effectiveness analysis: Improving the efficiency of estimation from censored data. Stat. Med.26 3274-3299. [90] Raudenbush, S. W., Yang, M.-L. and Yosef, M. (2000). Maximum likelihood for generalized linear models with nested random effects via high-order, multivariate Laplace approximation. J. Comput. Graph. Statist.9 141-157. [91] Rigby, R. A. and Stasinopoulos, D. M. (2005). Generalized additive models for location, scale and shape. J. R. Stat. Soc. Ser. C. Appl. Stat.54 507-554. · Zbl 1490.62201 · doi:10.1111/j.1467-9876.2005.00510.x [92] Robert, C. P. (2007). The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, 2nd ed. Springer Texts in Statistics. Springer, New York. · Zbl 1129.62003 [93] Rondeau, V., Schaffner, E., Corbière, F., Gonzalez, J. R. and Mathoulin-Pélissier, S. (2013). Cure frailty models for survival data: Application to recurrences for breast cancer and to hospital readmissions for colorectal cancer. Stat. Methods Med. Res.22 243-260. [94] Schoenfeld, D. (1982). Partial residuals for the proportional hazards regression model. Biometrika69 239-241. · Zbl 0506.03001 [95] Smith, V. A., Preisser, J. S., Neelon, B. and Maciejewski, M. L. (2014). A marginalized two-part model for semicontinuous data. Stat. Med.33 4891-4903. [96] Smith, V. A., Neelon, B., Maciejewski, M. L. and Preisser, J. S. (2017a). Two parts are better than one. Health Serv. Outcomes Res. Methodol.17 198-218. [97] Smith, V. A., Neelon, B., Preisser, J. S. and Maciejewski, M. L. (2017b). A marginalized two-part model for longitudinal semicontinuous data. Stat. Methods Med. Res.26 1949-1968. [98] Sobell, L. C. and Sobell, M. B. (1992). Timeline follow-back: A technique for assessing self-reported alcohol consumption. In Measuring Alcohol Consumption: Psychosocial and Biochemical Methods (R. Z. Litten and J. P. Allen, eds.) 41-72. Humana Press Inc., Totowa, NJ. [99] Sposto, R. (2002). Cure model analysis in cancer: An application to data from the children’s cancer group. Stat. Med.21 293-312. [100] Stram, D. O. and Lee, J. W. (1994). Variance components testing in the longitudinal mixed effects model. Biometrics50 1171-1177. · Zbl 0826.62054 · doi:10.2307/2533455 [101] Stukel, T. A., Lucas, F. L. and Wennberg, D. E. (2005). Long-term outcomes of regional variations in intensity of invasive vs medical management of medicare patients with acute myocardial infarction. J. Am. Med. Assoc.293 1329-1337. [102] Stukel, T. A., Fisher, E. S., Wennberg, D. E., Alter, D. A., Gottlieb, D. J. and Vermeulen, M. J. (2007). Analysis of observational studies in the presence of treatment selection bias effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. J. Am. Med. Assoc.297 278-285. [103] Su, L., Tom, B. D. M. and Farewell, V. T. (2009). Bias in 2-part mixed models for longitudinal semicontinuous data. Biostatistics10 374-389. · Zbl 1437.62617 [104] Su, X., Wijayasinghe, C. S., Fan, J. and Zhang, Y. (2016). Sparse estimation of Cox proportional hazards models via approximated information criteria. Biometrics72 751-759. · Zbl 1390.62121 · doi:10.1111/biom.12484 [105] Sy, J. P. and Taylor, J. M. G. (2000). Estimation in a Cox proportional hazards cure model. Biometrics56 227-236. · Zbl 1060.62670 · doi:10.1111/j.0006-341X.2000.00227.x [106] Therneau, T. M. and Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model. Statistics for Biology and Health. Springer, New York. · Zbl 0958.62094 [107] Tian, L., Zucker, D. and Wei, L. J. (2005). On the Cox model with time-varying regression coefficients. J. Amer. Statist. Assoc.100 172-183. · Zbl 1117.62435 · doi:10.1198/016214504000000845 [108] Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica26 24-36. · Zbl 0088.36607 · doi:10.2307/1907382 [109] Tooze, J. A., Grunwald, G. K. and Jones, R. H. (2002). Analysis of repeated measures data with clumping at zero. Stat. Methods Med. Res.11 341-355. · Zbl 1121.62674 · doi:10.1191/0962280202sm291ra [110] Tooze, J. A., Midthune, D., Dodd, K. W., Freedman, L. S., Krebs-Smith, S. M., Subar, A. F., Guenther, P. M., Carroll, R. J. and Kipnis, V. (2006). A new statistical method for estimating the usual intake of episodically consumed foods with application to their distribution. J. Am. Diet. Assoc.106 1575-1587. [111] Tsiatis, A. A. and Davidian, M. (2004). Joint modeling of longitudinal and time-to-event data: An overview. Statist. Sinica14 809-834. · Zbl 1073.62087 [112] Twisk, J. and Rijmen, F. (2009). Longitudinal tobit regression: A new approach to analyze outcome variables with floor or ceiling effects. J. Clin. Epidemiol.62 953-958. [113] Tyler, A. D., Smith, M. I. and Silverberg, M. S. (2014). Analyzing the human microbiome: A how to guide for physicians. Am. J. Gastroenterol.109 983-993. [114] Vonesh, E. F., Greene, T. and Schluchter, M. D. (2006). Shared parameter models for the joint analysis of longitudinal data and event times. Stat. Med.25 143-163. [115] Vuong, Q. H. (1989). Likelihood ratio tests for model selection and nonnested hypotheses. Econometrica57 307-333. · Zbl 0701.62106 · doi:10.2307/1912557 [116] Wang, M.-C., Qin, J. and Chiang, C.-T. (2001). Analyzing recurrent event data with informative censoring. J. Amer. Statist. Assoc.96 1057-1065. · Zbl 1072.62646 · doi:10.1198/016214501753209031 [117] Williamson, J. M., Datta, S. and Satten, G. A. (2003). Marginal analyses of clustered data when cluster size is informative. Biometrics59 36-42. · Zbl 1210.62082 · doi:10.1111/1541-0420.00005 [118] Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, MA. · Zbl 1441.62010 [119] Wulfsohn, M. S. and Tsiatis, A. A. (1997). A joint model for survival and longitudinal data measured with error. Biometrics53 330-339. · Zbl 0874.62140 · doi:10.2307/2533118 [120] Xie, H., McHugo, G., Sengupta, A., Clark, R. and Drake, R. (2004). A method for analyzing long longitudinal outcomes with many zeros. Ment. Health Serv. Res.6 239-246. [121] Yabroff, K. R., Warren, J. L., Schrag, D., Mariotto, A., Meekins, A., Topor, M. and Brown, M. L. (2009). Comparison of approaches for estimating incidence costs of care for colorectal cancer patients. Med. Care47 S56-S63. [122] Yamaguchi, K. (1992). Accelerated failure-time regression models with a regression model of surviving fraction: An application to the analysis of “Permanent Employment” in Japan. J. Amer. Statist. Assoc.87 284-292. [123] Yu, Z., Liu, L., Bravata, D. M., Williams, L. S. and Tepper, R. S. (2013). A semiparametric recurrent events model with time-varying coefficients. Stat. Med.32 1016-1026. [124] Zhang, M., Strawderman, R. L., Cowen, M. E. and Wells, M. T. (2006). Bayesian inference for a two-part hierarchical model: An application to profiling providers in managed health care. J. Amer. Statist. Assoc.101 934-945. · Zbl 1120.62308 · doi:10.1198/016214505000001429 [125] Zhou, X. H. and Tu, W. (1999). Comparison of several independent population means when their samples contain log-normal and possibly zero observations. Biometrics55 645-651. · Zbl 1059.62518 · doi:10.1111/j.0006-341X.1999.00645.x This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.