Combining isotonic regression and EM algorithm to predict genetic risk under monotonicity constraint. (English) Zbl 1454.62380

Summary: In certain genetic studies, clinicians and genetic counselors are interested in estimating the cumulative risk of a disease for individuals with and without a rare deleterious mutation. Estimating the cumulative risk is difficult, however, when the estimates are based on family history data. Often, the genetic mutation status in many family members is unknown; instead, only estimated probabilities of a patient having a certain mutation status are available. Also, ages of disease-onset are subject to right censoring. Existing methods to estimate the cumulative risk using such family-based data only provide estimation at individual time points, and are not guaranteed to be monotonic or nonnegative. In this paper, we develop a novel method that combines Expectation-Maximization and isotonic regression to estimate the cumulative risk across the entire support. Our estimator is monotonic, satisfies self-consistent estimating equations and has high power in detecting differences between the cumulative risks of different populations. Application of our estimator to a Parkinson’s disease (PD) study provides the age-at-onset distribution of PD in PARK2 mutation carriers and noncarriers, and reveals a significant difference between the distribution in compound heterozygous carriers compared to noncarriers, but not between heterozygous carriers and noncarriers.


62P10 Applications of statistics to biology and medical sciences; meta analysis


Full Text: DOI arXiv


[1] Ayer, M., Brunk, H. D., Ewing, G. M., Reid, W. T. and Silverman, E. (1955). An empirical distribution function for sampling with incomplete information. Ann. Math. Statist. 26 641-647. · Zbl 0066.38502
[2] Barlow, R. E., Bartholomew, D. J., Bremner, J. M. and Brunk, H. D. (1972). Statistical Inference Under Order Restrictions . Wiley, New York. · Zbl 0246.62038
[3] Begg, C. B. (2002). On the use of familial aggregation in population-based case probands for calculating penetrance. J. Natl. Cancer Inst. 94 1221-1226.
[4] Churchill, G. A. and Doerge, R. W. (1994). Empirical threshold values for quantitative trait mapping. Genetics 138 963-971.
[5] de Leeuw, J., Hornik, K. and Mair, P. (2009). Isotone optimization in R: Pool-adjacent-violators algorithm (PAVA) and active set methods. Journal of Statistical Software 5 1-24.
[6] Efron, B. (1967). The two sample problem with censored data. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability , IV 831-853. Univ. California Press, Berkeley, CA. · Zbl 0158.17803
[7] El Barmi, H. and McKeague, I. W. (2013). Empirical likelihood-based tests for stochastic ordering. Bernoulli 19 295-307. · Zbl 1259.62030
[8] Godambe, V. P. (1960). An optimum property of regular maximum likelihood estimation. Ann. Math. Statist. 31 1208-1211. · Zbl 0118.34301
[9] Goldwurm, S., Tunesi, S., Tesei, S., Zini, M., Sironi, F., Primignani, P., Magnani, C. and Pezzoli, G. (2011). Kin-cohort analysis of LRRK2-G2019S penetrance in Parkinson’s disease. Mov. Disord. 26 2144-2145.
[10] Grady, D., Parker-Pope, T. and Belluck, P. (2013). Jolie’s disclosure of preventative mastectomy highlights dilemma. New York Times , May 15, p. A1.
[11] Grotzinger, S. J. and Witzgall, C. (1984). Projections onto order simplexes. Appl. Math. Optim. 12 247-270. · Zbl 0577.65049
[12] Hedrich, K., Eskelson, C., Wilmot, B., Marder, K., Harris, J., Garrels, J., Meija-Santana, H., Vieregge, P., Jacobs, H., Bressman, S. B., Lang, A. E., Kann, M., Abbruzzese, G., Martinelli, P., Schwinger, E., Ozelius, L. J., Pramstaller, P. P., Klein, C. and Kramer, P. (2004). Distribution, type, and origin of Parkin mutations: Review and case studies. Mov. Disord. 19 1146-1157.
[13] Huang, C.-Y., Qin, J. and Zou, F. (2007). Empirical likelihood-based inference for genetic mixture models. Canad. J. Statist. 35 563-574. · Zbl 1142.62422
[14] Jewell, N. P. and Kalbfleisch, J. D. (2004). Maximum likelihood estimation of ordered multinomial parameters. Biostatistics 5 291-306. · Zbl 1154.62326
[15] Khoury, M., Beaty, H. and Cohen, B. (1993). Fundamentals of Genetic Epidemiology . Oxford Univ. Press, New York.
[16] Kitada, T., Asakawa, S., Hattori, N., Matsumine, H., Yamamura, Y., Minoshima, S., Yokochi, M., Mizuno, Y. and Shimizu, N. (1998). Mutations in the Parkin gene cause autosomal recessive juvenile parkinsonism. Nature 392 605-608.
[17] Kruskal, J. B. (1964). Nonmetric multidimensional scaling: A numerical method. Psychometrika 29 115-129. · Zbl 0123.36804
[18] Lücking, C. B., Dürr, A., Bonifati, V., Vaughan, J., De Michele, G., Gasser, T., Harhangi, B. S., Meco, G., Denefle, P., Wood, N. W., Agid, Y., Brice, A., French Parkinson’s Disease Genetics Study Group and European Consortium on Genetic Susceptibility in Parkinson’s Disease (2000). Association between early-onset Parkinson’s disease and mutations in the Parkin gene. New England Journal of Medicine 342 1560-1567.
[19] Luss, R., Rosset, S. and Shahar, M. (2010). Isotonic recursive partitioning. Preprint. Available at . 1102.5496
[20] Ma, Y. and Wang, Y. (2012). Efficient distribution estimation for data with unobserved sub-population identifiers. Electron. J. Stat. 6 710-737. · Zbl 1274.62250
[21] Ma, Y. and Wang, Y. (2014). Estimating disease onset distribution functions in mutation carriers with censored mixture data. J. R. Stat. Soc. Ser. C. Appl. Stat. 63 1-23.
[22] Marder, K., Levy, G., Louis, E. D., Mejia-Santana, H., Cote, L., Andrews, H., Harris, J., Waters, C., Ford, B., Frucht, S., Fahn, S. and Ottman, R. (2003). Accuracy of family history data on Parkinson’s disease. Neurology 61 18-23.
[23] Marder, K. S., Tang, M. X., Mejia-Santana, H., Rosado, L., Louis, E. D., Comella, C. L., Colcher, A., Siderowf, A. D., Jennings, D., Nance, M. A., Bressman, S., Scott, W. K., Tanner, C. M., Mickel, S. F., Andrews, H. F., Waters, C., Fahn, S., Ross, B. M., Cote, L. J., Frucht, S., Ford, B., Alcalay, R. N., Rezak, M., Novak, K., Friedman, J. H., Pfeiffer, R. F., Marsh, L., Hiner, B., Neils, G. D., Verbitsky, M., Kisselev, S., Caccappolo, E., Ottman, R. and Clark, L. N. (2010). Predictors of Parkin mutations in early-onset Parkinson disease: The consortium on risk for early-onset Parkinson disease study. Arch. Neurol. 67 731-738.
[24] McInerney-Leo, A., Hadley, D. W., Gwinn-Hardy, K. and Hardy, J. (2005). Genetic testing in Parkinson’s disease. Mov. Disord. 20 1-10.
[25] Oliveira, S. A., Scott, W. K., Martin, E. R., Nance, M. A., Watts, R. L., Hubble, J. P., Koller, W. C., Pahwa, R., Stern, M. B., Hiner, B. C., Ondo, W. G., Fred H. Allen, J., Scott, B. L., Goetz, C. G., Small, G. W., Mastaglia, F., Stajich, J. M., Zhang, F., Booze, M. W., Winn, M. P., Middleton, L. T., Haines, J. L., Pericak-Vance, M. A. and Vance, J. M. (2003). Parkin mutations and susceptibility alleles in late-onset Parkinson’s disease. Ann. Neurol. 53 624-629.
[26] Park, Y., Taylor, J. M. G. and Kalbfleisch, J. D. (2012). Pointwise nonparametric maximum likelihood estimator of stochastically ordered survivor functions. Biometrika 99 327-343. · Zbl 1318.62300
[27] Qin, J. Garcia, T. P. Ma, Y. Tang, M.-X. Marder, K. and Wang, Y. (2014). Supplement to “Combining isotonic regression and EM algorithm to predict genetic risk under monotonicity constraint.” . · Zbl 1454.62380
[28] Robertson, T., Wright, F. T. and Dykstra, R. L. (1988). Order Restricted Statistical Inference . Wiley, Chichester. · Zbl 0645.62028
[29] Struewing, J. P., Hartge, P., Wacholder, S., Baker, S. M., Berlin, M., McAdams, M., Timmerman, M. M., Brody, L. C. and Tuker, M. A. (1997). The risk of cancer associated with specific mutations of BRCA1 and BRCA2 among Ashkenazi Jews. New England Journal of Medicine 336 1401-1408.
[30] Wang, Y., Garcia, T. P. and Ma, Y. (2012). Nonparametric estimation for censored mixture data with application to the cooperative Huntington’s observational research trial. J. Amer. Statist. Assoc. 107 1324-1338. · Zbl 1258.62113
[31] Wang, Y., Clark, L. N., Marder, K. and Robinowitz, D. (2007). Nonparametric estimation of genotype-specific age-at-onset distributions from censored kin-cohort data. Biometrika 94 403-414. · Zbl 1133.62091
[32] Wang, Y., Clark, L. N., Louis, E. D., Mejia-Santana, H., Harris, J., Cote, L. J., Waters, C., Andrews, D., Ford, B., Frucht, S., Fahn, S., Ottman, R., Rabinowitz, D. and Marder, K. (2008). Risk of Parkinson’s disease in carriers of Parkin mutations: Estimation using the kin-cohort method. Arch. Neurol. 65 467-474.
[33] Wu, C.-F. J. (1983). On the convergence properties of the EM algorithm. Ann. Statist. 11 95-103. · Zbl 0517.62035
[34] Wu, R., Ma, C.-X. and Casella, G. (2007). Statistical Genetics of Quantitative Traits : Linkage , Maps , and QTL . Springer, New York. · Zbl 1126.92036
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.