The role of statistics in the era of big data: electronic health records for healthcare research. (English) Zbl 06892176

Summary: The transferring of medical records into huge electronic databases has opened up opportunities for research but requires attention to data quality, study design and issues of bias and confounding.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62R07 Statistical aspects of big data and data science
92C50 Medical applications (general)


mstate; flexsurv; msm
Full Text: DOI Link


[1] Austin, P. C.; Stuart, E. A., Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies, Stat. Med., 34, 3661-3679, (2015)
[2] Belin, T. R.; Rubin, D. B., A method for calibrating false-match rates in record linkage, J. Amer. Statist. Assoc., 90, 694-707, (1995) · Zbl 0925.62548
[3] Benchimol, E. I.; Smeeth, L.; Guttmann, A.; Harron, K.; Moher, D.; Petersen, I.; Sorensen, H. T.; Von Elm, E.; Langan, S. M.; Committee, R. W., The reporting of studies conducted using observational routinely-collected health data (RECORD) statement, PLoS Med., 12, e1001885, (2015)
[4] Carpenter, J. R.; Kenward, M. G., (Multiple Imputation and its Application, Statistics in Practice, (2013), Wiley Chicester)
[5] de Wreede, L. C.; Fiocco, M.; Putter, H., Mstate: an R package for the analysis of competing risks and multi-state models, J. Stat. Softw., 38, (2011)
[6] Dusetzina, S. B.; Tyree, S.; Meyer, A. M.; Meyer, A.; Green, L.; Carpenter, W. R., (Quality, A. F.H. R.A., Linking Data for Health Services Research: A Framework and Instructional Guide, (2014), The University of North Carolina at Chapel Hill Chapel Hill, NC)
[7] Elfeky, M.G., Verykios, V.S., Elmagarmid, A.K., Ghanem, T.M., Huwait, A.R., 2003. Record Linkage: A Machine Learning Approach, A Toolbox, and a Digital Government Web Service Purdue e-Pubs Computer Science Technical Reports. USA: Purdue University.
[8] Frobert, O.; Lagerqvist, B.; Gudnason, T.; Thuesen, L.; Svensson, R.; Olivecrona, G. K.; James, S. K., Thrombus aspiration in ST-elevation myocardial infarction in Scandinavia (TASTE trial). A multicenter, prospective, randomized, controlled clinical registry trial based on the swedish angiography and angioplasty registry (SCAAR) platform. study design and rationale, Am. Heart J., 160, 1042-1048, (2010)
[9] Geneletti, S.; O’Keeffe, A. G.; Sharples, L. D.; Richardson, S.; Baio, G., Bayesian regression discontinuity designs: incorporating clinical knowledge in the causal analysis of primary care data, Stat. Med., 34, 19, (2015)
[10] Goldstein, H., Multilevel statistical models, (1987), Wiley · Zbl 0619.62064
[11] Gruber, G., van der Laan, M.J., 2009. Targeted Maximum Likelihood Estimation: A Gentle Introduction. U.C. Berkeley Division of Biostatistics Working Paper Series Berkeley: University of California.
[12] Gruger, J.; Kay, R.; Schumacher, M., The validity of inferences based on incomplete observations in disease state models, Biometrics, 47, 595-605, (1991)
[13] Harron, K.; Wade, A.; Gilbert, R.; Muller-Pebody, B.; Goldstein, H., Evaluating bias due to data linkage error in electronic healthcare records, BMC Medical Res. Methods, 14, 36, (2014)
[14] Harron, K.; Goldstein, H.; Dibben, C., Methodological developments in data linkage, (2015), Wiley
[15] Herranz, J.; Nin, J.; Rodríguez, P.; Tassa, T., Revisiting distance-based record linkage for privacy-preserving release of statistical datasets, Data Knowl. Eng., 100, (2015)
[16] Hurley, C.; Shiely, F.; Power, J.; Clarke, M.; Eustace, J. A.; Flanagan, E.; Kearney, P. M., Risk based monitoring (RBM) tools for clinical trials: A systematic review, Contemp. Clin. Trials, 51, 15-27, (2016)
[17] Ieva, F.; Jackson, C. H.; Sharples, L. D., Multi-state modelling of repeated hospitalisation and death in patients with heart failure: the use of large administrative databases in clinical epidemiology, Stat. Methods Med. Res., 26, 1350-1372, (2017)
[18] Jackson, C. H., Multi-state models for panel data: the msm package for R, J. Stat. Softw., 38, (2011)
[19] Jackson, C. H., Flexsurv: A platform for parametric survival modeling in R, J. Stat. Softw., 70, (2016)
[20] Jackson, C. H.; Bojke, L.; Thompson, S. G.; Claxton, K.; Sharples, L. D., A framework for addressing structural uncertainty in decision models, Med. Decis. Mak., 31, 662-674, (2011)
[21] Leyrat, C.; Seaman, S. R.; White, I. R.; Douglas, I.; Smeeth, L.; Kim, J.; Resche-Rigon, M.; Carpenter, J. R.; Williamson, E. J., Propensity score analysis with partially observed covariates: how should multiple imputation be used?, Stat. Methods Med. Res., (2017), 962280217713032
[22] NICE, 2013. Guide to the methods of technology appraisal 2013. National Institute for Health and Care Excellence.
[23] Olsen, R.; Bell, S.; Orr, L.; Stuart, E. A., External validity in policy evaluations that choose sites purposively, J. Policy Anal. Manag., 32, 15, (2013)
[24] Pearl, J.; Glymour, M.; Jewell, N. P., Causal inference in statistics: A primer, (2016), Wiley · Zbl 1332.62001
[25] Relton, C.; Torgerson, D.; O’Cathain, A.; Nicholl, J., Rethinking pragmatic randomised controlled trials: introducing the “cohort multiple randomised controlled trial” design, BMJ, 340, c1066, (2010)
[26] Robins, G., A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect, Math. Model., 7, 1393-1512, (1986) · Zbl 0614.62136
[27] Schneeweiss, S.; Rassen, J. A.; Glynn, R. J.; Avorn, J.; Mogun, H.; Brookhart, M. A., High-dimensional propensity score adjustment in studies of treatment effects using health care claims data, Epidemiology, 20, 512-522, (2009)
[28] Smith, G. D.; Ebrahim, S., ‘mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease?, Int. J. Epidemiol., 32, 1-22, (2003)
[29] Stuart, E. A., Matching methods for causal inference: A review and a look forward, Statist. Sci., 25, 21, (2010) · Zbl 1328.62007
[30] Stuart, E. A.; Cole, S. R.; Bradshaw, C. P.; Leaf, P. J., The use of propensity scores to assess the generalizability of results from randomized trials, J. R. Stat. Soc. Ser. A, 174, 18, (2011)
[31] Welch, C. A.; Petersen, I.; Bartlett, J. W.; White, I. R.; Marston, L.; Morris, R. W.; Nazareth, I.; Walters, K.; Carpenter, J., Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data, Stat. Med., 33, 3725-3737, (2014)
[32] Williamson, E.; Morley, R.; Lucas, A.; Carpenter, J., Propensity scores: from naive enthusiasm to intuitive understanding, Stat. Methods Med. Res., 21, 273-293, (2012) · Zbl 1242.62124
[33] Winkler, W.E., 2006. Overview of Record Linkage and Current Research Directions. In: Statistical Research Division, U. S. C. B. (ed.) Research Report Series. Washington, DC: Statistical Research Division, U.S. Census Bureau.
[34] Young, R. C., Cancer clinical trials-a chronic but curable crisis, N. Engl. J. Med., 363, 306-309, (2010)
[35] Zeng, L.; Cook, R. J.; Wen, L.; Boruvka, A., Bias in progression-free survival analysis due to intermittent assessment of progression, Stat. Med., 34, 3181-3193, (2015)
[36] Zhu, Y.; Matsuyama, Y.; Ohashi, Y.; Setoguchi, S., When to conduct probabilistic linkage vs. deterministic linkage? A simulation study, J. Biomed. Inform., 56, 80-86, (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.