zbMATH — the first resource for mathematics

A joint model for multistate disease processes and random informative observation times, with applications to electronic medical records data. (English) Zbl 1419.62384
Summary: Multistate models are used to characterize individuals’ natural histories through diseases with discrete states. Observational data resources based on electronic medical records pose new opportunities for studying such diseases. However, these data consist of observations of the process at discrete sampling times, which may either be pre-scheduled and non-informative, or symptom-driven and informative about an individual’s underlying disease status. We have developed a novel joint observation and disease transition model for this setting. The disease process is modeled according to a latent continuous-time Markov chain; and the observation process, according to a Markov-modulated Poisson process with observation rates that depend on the individual’s underlying disease status. The disease process is observed at a combination of informative and non-informative sampling times, with possible misclassification error. We demonstrate that the model is computationally tractable and devise an expectation-maximization algorithm for parameter estimation. Using simulated data, we show how estimates from our joint observation and disease transition model lead to less biased and more precise estimates of the disease rate parameters. We apply the model to a study of secondary breast cancer events, utilizing mammography and biopsy records from a sample of women with a history of primary breast cancer.

62P10 Applications of statistics to biology and medical sciences; meta analysis
62M05 Markov processes: estimation; hidden Markov models
numDeriv; R
Full Text: DOI
[1] Aalen, Phase type distributions in survival analysis, Scandinavian Journal of Statistics 22 pp 447– (1995) · Zbl 0836.62095
[2] Andersen, Multi-state models for event history analysis, Statistical Methods in Medical Research 11 pp 91– (2002) · Zbl 1121.62568 · doi:10.1191/0962280202SM276ra
[3] Andreetta, Adjuvant endocrine therapy for early breast cancer, Cancer Letters 251 pp 17– (2007) · doi:10.1016/j.canlet.2006.10.021
[4] Baum, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Annals of Mathematical Statistics 41 pp 164– (1970) · Zbl 0188.49603 · doi:10.1214/aoms/1177697196
[5] Boer, Diversity of model approaches for breast cancer screening: A review of model assumptions by the Cancer Intervention and Surveillance Network (CISNET) Breast Cancer Groups, Statistical Methods in Medical Research 13 pp 525– (2004) · Zbl 1121.62579 · doi:10.1191/0962280204sm381ra
[6] Buist, Diagnosis of second breast cancer events after initial diagnosis of early stage breast cancer, Breast Cancer Research and Treatment 124 pp 863– (2010) · doi:10.1007/s10549-010-1106-6
[7] Buist, Long-term surveillance mammography and mortality in older women with a history of early stage invasive breast cancer, Breast Cancer Research and Treatment 142 pp 153– (2013) · doi:10.1007/s10549-013-2720-x
[8] Chapman, Competing risks analyses for recurrence from primary breast cancer, British Journal of Cancer 79 pp 1508– (1999) · doi:10.1038/sj.bjc.6690240
[9] Chen, Analysis of interval-censored disease progression data via multi-state models under a nonignorable inspection process, Statistics in Medicine 29 pp 1175– (2010) · doi:10.1002/sim.3804
[10] Chen, Non-homogeneous Markov process models with informative observations with an application to Alzheimer’s disease, Biometrical Journal 53 pp 444– (2011) · Zbl 1213.62132 · doi:10.1002/bimj.201000122
[11] Chen, A correlated random effects model for non-homogeneous Markov processes with nonignorable missingness, Journal of Multivariate Analysis 117 pp 1– (2013) · Zbl 1277.62263 · doi:10.1016/j.jmva.2013.01.009
[12] Chen, Semi-Markov models for multistate data analysis with periodic observations, Communications in Statistics-Theory and Methods 33 pp 475– (2004) · Zbl 1066.62096 · doi:10.1081/STA-120028679
[13] Cumani, On the canonical representation of homogeneous Markov processes modelling failure-time distributions, Microelectronics and Reliability 22 pp 583– (1982) · Zbl 0532.93048 · doi:10.1016/0026-2714(82)90033-6
[14] Daley, An Introduction to the Theory of Point Processes (2003) · Zbl 1026.60061
[15] de Bock, Isolated loco-regional recurrence of breast cancer is more common in young patients and following breast conserving therapy: Long-term results of European Organisation for Research and Treatment of Cancer studies, European Journal of Cancer 42 pp 351– (2006) · doi:10.1016/j.ejca.2005.10.006
[16] Dean, Use of electronic medical records for health outcomes research: A literature review, Medical Care Research and Review 66 pp 611– (2009) · doi:10.1177/1077558709332440
[17] Demicheli, Time distribution of the recurrence risk for breast cancer patients undergoing masectomy: Further support about the concept of tumor dormancy, Breast Cancer Research and Treatment 41 pp 177– (1996) · doi:10.1007/BF01807163
[18] Dempster, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society 39 pp 1– (1977) · Zbl 0364.62022
[19] Diggle, Geostatistical inference under preferential sampling, Journal of the Royal Statistical Society, Series C 59 pp 191– (2010) · doi:10.1111/j.1467-9876.2009.00701.x
[20] Dillon, The accuracy of ultrasound, stereotactic, and clinical core biopsies in the diagnosis of breast cancer, with an analysis of false-negative cases, Annals of Surgery 242 pp 701– (2005) · doi:10.1097/01.sla.0000186186.05971.e0
[21] Etzioni, Estimating asymptomatic duration in cancer: The AIDS connection, Statistics in Medicine 16 pp 627– (1997) · doi:10.1002/(SICI)1097-0258(19970330)16:6<627::AID-SIM438>3.0.CO;2-7
[22] Fassil, Approaches for classifying the indications for colonoscopy using detailed clinical data, BioMed Central Cancer 14 (2014)
[23] Fearnhead, An exact Gibbs sampler for the Markov-modulated Poisson process, Journal of the Royal Statistical Society 68 pp 767– (2006) · Zbl 1110.62131 · doi:10.1111/j.1467-9868.2006.00566.x
[24] Fenton, Distinguishing screening from diagnostic mammograms using medicare claims data, Medical Care 52 pp 44– (2014) · doi:10.1097/MLR.0b013e318269e0f5
[25] Freed, A Poisson process whose rate is a hidden Markov process, Advances in Applied Probability 14 pp 21– (1982) · Zbl 0485.60069 · doi:10.1017/S0001867800036685
[26] Geiger, Recurrences and second primary breast cancers in older women with initial early-stage disease, Cancer 109 pp 966– (2007) · doi:10.1002/cncr.22472
[27] Gerhard, Apply the multivarate time-rescaling theorem to neural population models, Neural Computation 23 pp 1452– (2011) · Zbl 1217.92027 · doi:10.1162/NECO_a_00126
[28] Gilbert, numDeriv: Accurate Numerical Derivatives (2012)
[29] Gruger, The validity of inferences based on incomplete observations in disease state models, Biometrics 47 pp 595– (1991) · doi:10.2307/2532149
[30] Houssami, Accuracy and outcomes of screening mammography in women with a personal history of early-stage breast cancer, Journal of the American Medical Association 305 pp 790– (2011) · doi:10.1001/jama.2011.188
[31] Hubbard, Modeling nonhomogeneous Markov processes via time transformation, Biometrics 64 pp 843– (2008) · Zbl 1146.62089 · doi:10.1111/j.1541-0420.2007.00932.x
[32] Kalbfleisch, The analysis of panel data under a Markov assumption, Journal of the American Statistical Association 80 pp 863– (1985) · Zbl 0586.62136 · doi:10.1080/01621459.1985.10478195
[33] Kang, Statistical methods for panel data from a semi-Markov process, with application to HPV, Biostatistics 8 pp 252– (2007) · Zbl 1121.62073 · doi:10.1093/biostatistics/kxl006
[34] Kay, A Markov model for analysing cancer markers and disease states in survival studies, Biometrics 42 pp 855– (1986) · Zbl 0622.62100 · doi:10.2307/2530699
[35] Lange, Fitting and interpreting continuous-time latent Markov models for panel data, Statistics in Medicine 32 pp 4581– (2013) · doi:10.1002/sim.5861
[36] Li, Semiparametric transformation models for panel count data with correlated observation and follow-up times, Statistics in Medicine 32 pp 3039– (2013) · doi:10.1002/sim.5724
[37] Lindqvist, Proceedings of the 59th ISI World Statistics Congress (2013)
[38] Longini, Statistical analysis of the stages of HIV infection using a Markov model, Statistics in Medicine 8 pp 831– (1989) · doi:10.1002/sim.4780080708
[39] Lu, Markov modulated Poisson process associated with state-dependent marks and its applications to the deep earthquakes, Annals of the Institute of Statistical Mathematics 64 pp 87– (2012) · Zbl 1238.62093 · doi:10.1007/s10463-010-0302-9
[40] Mark, An EM algorithm for continuous-time bivariate Markov chains, Computational Statistics & Data Analysis 57 pp 504– (2013) · Zbl 1365.60068 · doi:10.1016/j.csda.2012.07.017
[41] Meira-Machodo, Multi-state models for the analysis of time-to-event data, Statistical Methods in Medical Research 18 pp 195– (2009) · doi:10.1177/0962280208092301
[42] Meyer, Démonstration simplifiée d’un théorème de Knight, Séminaire de probabilités de Strasbourg 5 pp 191– (1971) · doi:10.1007/BFb0058859
[43] Moran, Long-term outcomes and clinicopathologic differences of African-American versus white patients treated with breast conservation therapy for early-stage breast cancer, Cancer 113 pp 2565– (2008) · doi:10.1002/cncr.23881
[44] 2013 R: A Language and Environment for Statistical Computing
[45] Robertson, Surveillance mammography for detecting ipsilateral breast tumour recurrence and metachronous contralateral breast cancer: A systematic review, European Radiology 21 pp 2484– (2011) · doi:10.1007/s00330-011-2226-z
[46] Ryden, An EM algorithm for estimation in Markov-modulated Poisson processes, Computational Statistics & Data Analysis 21 pp 431– (1996) · Zbl 0875.62405 · doi:10.1016/0167-9473(95)00025-9
[47] Saint-Pierre, The analysis of asthma control under a Markov assumption with use of covariates, Statistics in Medicine 22 pp 3755– (2003) · doi:10.1002/sim.1680
[48] Siegel, Cancer treatment and survivorship statistics, 2012, CA: A Cancer Journal for Clinicians 62 pp 220– (2012)
[49] Steele, Frontiers of Statistical Decision Making and Bayesian Analysis pp 113– (2010)
[50] Sun, Semiparametric regression analysis of longitudinal data with informative observation times, Journal of the American Statistical Association 100 pp 882– (2005) · Zbl 1117.62432 · doi:10.1198/016214505000000060
[51] Sundberg, Maximum likelihood theory for incomplete data from an exponential family, Scandinavian Journal of Statistics 1 pp 49– (1973) · Zbl 0284.62014
[52] Sweeting, Multi-state Markov models for disease progression in the presence of informative examination times: An application to hepatitis C, Statistics in Medicine 29 pp 1161– (2010) · doi:10.1002/sim.3812
[53] Titman, Flexible nonhomogeneous Markov models for panel observed data, Biometrics 67 pp 780– (2011) · Zbl 1226.62072 · doi:10.1111/j.1541-0420.2010.01550.x
[54] Titman, A general goodness-of-fit test for Markov and hidden Markov models, Statistics in Medicine 27 pp 2177– (2008) · doi:10.1002/sim.3033
[55] Titman, Semi-Markov models with phase-type sojourn distributions, Biometrics 66 pp 742– (2010) · Zbl 1203.62203 · doi:10.1111/j.1541-0420.2009.01339.x
[56] Wirtz, Factors associated with long-term adherence to annual surveillance mammography among breast cancer survivors, Breast Cancer Research and Treatment 143 pp 541– (2014) · doi:10.1007/s10549-013-2816-3
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.