Semiparametric regression in testicular germ cell data. (English) Zbl 1254.62056

Summary: It is possible to approach regression analysis with random covariates from a semiparametric perspective where information is combined from multiple multivariate sources. The approach assumes a semiparametric density ratio model where multivariate distributions are “regressed” on a reference distribution. A kernel density estimator can be constructed from many data sources in conjunction with the semiparametric model. The estimator is shown to be more efficient than the traditional single-sample kernel density estimator, and its optimal bandwidth is discussed in some detail. Each multivariate distribution and the corresponding conditional expectation (regression) of interest are estimated from the combined data using all sources. Graphical and quantitative diagnostic tools are suggested to assess model validity. The method is applied in quantifying the effect of height and age on weight of germ cell testicular cancer patients. Comparisons are made with multiple regression, generalized additive models (GAM) and nonparametric kernel regression.


62G08 Nonparametric regression and quantile regression
62G07 Density estimation
62P10 Applications of statistics to biology and medical sciences; meta analysis
92C50 Medical applications (general)
62H12 Estimation in multivariate analysis
65C60 Computational problems in statistics (MSC2010)
62A09 Graphical methods in statistics


Full Text: DOI arXiv Euclid


[1] Anderson, T. W. (1971). An Introduction to Multivariate Statistical Analysis . Wiley, New York.
[2] Bondell, H. D. (2007). Testing goodness-of-fit in logistic case-control studies. Biometrika 94 487-495. · Zbl 1132.62020
[3] Cheng, K. F. and Chu, C. K. (2004). Semiparametric density estimation under a two-sample density ratio model. Bernoulli 10 583-604. · Zbl 1055.62032
[4] Fokianos, K. (2004). Merging information for semiparametric density estimation. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 941-958. · Zbl 1059.62028
[5] Fokianos, K., Kedem, B., Qin, J. and Short, D. A. (2001). A semiparametric approach to the one-way layout. Technometrics 43 56-65. · Zbl 1072.62583
[6] Gilbert, P. B. (2004). Goodness-of-fit tests for semiparametric biased sampling models. J. Statist. Plann. Inference 118 51-81. · Zbl 1031.62036
[7] Gilbert, P. B., Lele, S. R. and Vardi, Y. (1999). Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials. Biometrika 86 27-43. · Zbl 0917.62061
[8] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability 43 . Chapman and Hall, London. · Zbl 0747.62061
[9] Kedem, B., Lu, G., Wei, R. and Williams, P. D. (2008). Forecasting mortality rates via density ratio modeling. Canad. J. Statist. 36 193-206. · Zbl 1146.62069
[10] Kedem, B., Kim, E.-y., Voulgaraki, A. and Graubard, B. I. (2009). Two-dimensional semiparametric density ratio modeling of testicular germ cell data. Stat. Med. 28 2147-2159.
[11] Li, T.-H. and Song, K.-S. (2002). Asymptotic analysis of a fast algorithm for efficient multiple frequency estimation. IEEE Trans. Inform. Theory 48 2709-2720. · Zbl 1062.94516
[12] Lu, G. (2007). Asymptotic theory for multiple-sample semiparpametric density ratio models and its application to mortality forecasting. Ph.D. dissertation, Univ. Maryland, College Park, MD.
[13] McGlynn, K. A. and Cook, M. B. (2010). The epidemiology of testicular cancer. In Male Reproductive Cancers : Epidemiology , Pathology and Genetics (W. D. Foulkes and K. A. Cooney, eds.) 51-83. Springer, New York.
[14] McGlynn, K. A., Devesa, S. S., Sigurdson, A. J., Brown, L. M., Tsao, L. and Tarone, R. E. (2003). Trends in the incidence of testicular germ cell tumors in the United States. Cancer 97 63-70.
[15] McGlynn, K. A., Sakoda1, L. C., Rubertone, M. V., Sesterhenn, I. A., Lyu, C., Graubard, B. I. and Erickson, R. L. (2007). Body size, dairy consumption, puberty, and risk of testicular germ cell tumors. American Journal of Epidemiology 165 355-363.
[16] Nadaraya, E. A. (1964). On estimating regression. Theory Probab. Appl. 9 141-142. · Zbl 0136.40902
[17] Ogden, C. L., Fryar, C. D., Carroll, M. D. and Flegal, K. M. (2004). Mean body weight, height, and body mass index, United States 1960-2002. Adv. Data 347 1-17.
[18] Parzen, E. (1962). On estimation of a probability density function and mode. Ann. Math. Statist. 33 1065-1076. · Zbl 0116.11302
[19] Phue, J.-N., Kedem, B., Jaluria, P. and Shiloach, J. (2007). Evaluating microarrays using a semiparametric approach: Application to the central carbon metabolism of Escherichia coli BL21 and JM109. Genomics 89 300-305.
[20] Prentice, R. L. and Pyke, R. (1979). Logistic disease incidence models and case-control studies. Biometrika 66 403-411. · Zbl 0428.62078
[21] Qin, J. (1998). Inferences for case-control and semiparametric two-sample density ratio models. Biometrika 85 619-630. · Zbl 0954.62053
[22] Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations. Ann. Statist. 22 300-325. · Zbl 0799.62049
[23] Qin, J. and Zhang, B. (1997). A goodness-of-fit test for logistic regression models based on case-control data. Biometrika 84 609-618. · Zbl 0888.62045
[24] Qin, J. and Zhang, B. (2005). Density estimation under a two-sample semiparametric model. J. Nonparametr. Stat. 17 665-683. · Zbl 1076.62038
[25] Rencher, A. C. (2000). Linear Models in Statistics . Wiley, New York. · Zbl 0943.62061
[26] Shao, J. (2003). Mathematical Statistics , 2nd ed. Springer, New York. · Zbl 1018.62001
[27] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis . Chapman and Hall, London. · Zbl 0617.62042
[28] Voulgaraki, A., Kedem, B. and Graubard, B. I. (2012). Supplement to “Semiparametric regression in testicular germ cell data.” . · Zbl 1254.62056
[29] Watson, G. S. (1964). Smooth regression analysis. Sankhyā Ser. A 26 359-372. · Zbl 0137.13002
[30] Wen, S. and Kedem, B. (2009). A semiparametric cluster detection method-a comprehensive power comparison with Kulldorff’s method. International Journal of Health Geographics 8 . Online journal without page numbers.
[31] Wood, S. N. (2006). Generalized Additive Models : An Introduction With R . Chapman & Hall/CRC, Boca Raton, FL. · Zbl 1087.62082
[32] Zhang, B. (1999). A chi-squared goodness-of-fit test for logistic regression models based on case-control data. Biometrika 86 531-539. · Zbl 0938.62054
[33] Zhang, B. (2000). A goodness of fit test for multiplicative-intercept risk models based on case-control data. Statist. Sinica 10 839-865. · Zbl 1053.62580
[34] Zhang, B. (2001). An information matrix test for logistic regression models based on case-control data. Biometrika 88 921-932. · Zbl 1099.62512
[35] Zhang, B. (2002). Assessing goodness-of-fit of generalized logit models based on case-control data. J. Multivariate Anal. 82 17-38. · Zbl 0995.62047
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.