×

Dimension reduction and estimation in the secondary analysis of case-control studies. (English) Zbl 1402.62072

Summary: Studying the relationship between covariates based on retrospective data is the main purpose of secondary analysis, an area of increasing interest. We examine the secondary analysis problem when multiple covariates are available, while only a regression mean model is specified. Despite the completely parametric modeling of the regression mean function, the case-control nature of the data requires special treatment and semiparametric efficient estimation generates various nonparametric estimation problems with multivariate covariates. We devise a dimension reduction approach that fits with the specified primary and secondary models in the original problem setting, and use reweighting to adjust for the case-control nature of the data, even when the disease rate in the source population is unknown. The resulting estimator is both locally efficient and robust against the misspecification of the regression error distribution, which can be heteroscedastic as well as non-Gaussian. We demonstrate the advantage of our method over several existing methods, both analytically and numerically.

MSC:

62G08 Nonparametric regression and quantile regression
62H12 Estimation in multivariate analysis
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDF BibTeX XML Cite
Full Text: DOI Euclid

References:

[1] Chatterjee, N. and Carroll, R. J. (2005). Semiparametric maximum likelihood estimation in case-control studies of gene-environment interactions., Biometrika, 92, 399-418. · Zbl 1094.62136
[2] Chatterjee, N., Chen, Y.-H., Luo, S., and Carroll, R. J. (2009). Analysis of case-control association studies: SNPs, imputation and haplotypes., Statistical Science, 24, 489-502. · Zbl 1329.62421
[3] Chen, J., Ayyagari, R., Chatterjee, N., Pee, D. Y., Schairer, C., Byrne, C., Benichou, J., and Gail, M. H. (2008). Breast cancer relative hazard estimates from case – control and cohort designs with missing data on mammographic density., Journal of the American Statistical Association, 103, 976-988. · Zbl 1205.62163
[4] Chen, J., Pee, D., Ayyagari, R., Graubard, B., Schairer, C., Byrne, C., Benichou, J., and Gail, M. H. (2006). Projecting absolute invasive breast cancer risk in white women with a model that includes mammographic density., Journal of the National Cancer Institute, 98, 1215-1226.
[5] Chen, Y. H., Chatterjee, N., and Carroll, R. J. (2008). Retrospective analysis of haplotype-based case-control studies under a flexible model for gene-environment association., Biostatistics, 9, 81-99. · Zbl 1274.62743
[6] Chen, Y. H., Chatterjee, N., and Carroll, R. J. (2009). Shrinkage estimators for robust and efficient inference in haplotype-based case-control studies., Journal of the American Statistical Association, 104, 220-233. · Zbl 1388.62054
[7] Cook, R. D. (1994). On the interpretation of regression plots., Journal of the American Statistical Association, 89, 177-189. · Zbl 0791.62066
[8] Cook, R. D. (2009)., Regression Graphics: Ideas for Studying Regressions Through Graphics, volume 482. John Wiley & Sons. · Zbl 0903.62001
[9] Cook, R. D. and Li, B. (2002). Dimension reduction for conditional mean in regression., Annals of Statistics, pages 455-474. · Zbl 1012.62035
[10] Cook, R. D. and Setodji, C. M. (2003). A model-free test for reduced rank in multivariate regression., Journal of the American Statistical Association, 98, 340-351. · Zbl 1041.62048
[11] Dong, Y. and Li, B. (2010). Dimension reduction for non-elliptically distributed predictors: second-order methods., Biometrika, 97, 279-294. · Zbl 1233.62119
[12] Jiang, Y., Scott, A. J., and Wild, C. J. (2006). Secondary analysis of case-control data., Statistics in Medicine, 25, 1323-1339.
[13] Li, B. and Dong, Y. (2009). Dimension reduction for nonelliptically distributed predictors., Annals of Statistics, pages 1272-1298. · Zbl 1160.62050
[14] Li, B. and Wang, S. (2007). On directional regression for dimension reduction., Journal of the American Statistical Association, 102, 997-1008. · Zbl 1469.62300
[15] Li, B., Wen, S., and Zhu, L. (2008). On a projective resampling method for dimension reduction with multivariate responses., Journal of the American Statistical Association, 103, 1177-1186. · Zbl 1205.62067
[16] Li, B., Zha, H., and Chiaromonte, F. (2005). Contour regression: a general approach to dimension reduction., Annals of Statistics, pages 1580-1616. · Zbl 1078.62033
[17] Li, H., Gail, M. H., Berndt, S., and Chatterjee, N. (2010). Using cases to strengthen inference on the association between single nucleotide polymorphisms and a secondary phenotype in genome-wide association studies., Genetic Epidemiology, 34, 427-433.
[18] Li, K.-C. (1991). Sliced inverse regression for dimension reduction., Journal of the American Statistical Association, 86, 316-327. · Zbl 0742.62044
[19] Li, K.-C. (1992). On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma., Journal of the American Statistical Association, 87, 1025-1039. · Zbl 0765.62003
[20] Li, K.-C. and Duan, N. (1989). Regression analysis under link violation., Annals of Statistics, 17, 1009-1052. · Zbl 0753.62041
[21] Lian, H., Liang, H., and Carroll, R. J. (2015). Variance function partially linear single-index models., Journal of the Royal Statistical Society: Series B, 77, 171-194. · Zbl 1414.62301
[22] Lin, D. Y. and Zeng, D. (2009). Proper analysis of secondary phenotype data in case-control association studies., Genetic Epidemiology, 33, 256-265.
[23] Ma, Y. (2010). A semiparametric efficient estimator in case-control studies., Bernoulli, 16, 585-603. · Zbl 1345.62062
[24] Ma, Y. and Carroll, R. J. (2016). Semiparametric estimation in the secondary analysis of case – control studies., Journal of the Royal Statistical Society, Series B, 78, 127-151. · Zbl 1411.62326
[25] Ma, Y. and Zhu, L. (2012a). Efficiency loss caused by linearity condition in dimension reduction., Biometrika, 99, 1-13.
[26] Ma, Y. and Zhu, L. (2012b). A semiparametric approach to dimension reduction., Journal of the American Statistical Association, 107, 168-179. · Zbl 1261.62037
[27] Ma, Y. and Zhu, L. (2013a). Efficient estimation in sufficient dimension reduction., Annals of Statistics, 41, 250-268. · Zbl 1347.62089
[28] Ma, Y. and Zhu, L. (2013b). A review on dimension reduction., International Statistical Review, 81, 134-150. · Zbl 1416.62220
[29] Ma, Y. and Zhu, L. P. (2013c). Efficient estimation in sufficient dimension reduction., Annals of Statistics, 41, 250-268. · Zbl 1347.62089
[30] Piegorsch, W. W., Weinberg, C. R., and Taylor, J. A. (1994). Non-hierarchical logistic models and case-only designs for assessing susceptibility in population based case-control studies., Statistics in Medicine, 13, 153-162.
[31] Prentice, R. L. and Pyke, R. (1979). Logistic disease incidence models and case-control studies., Biometrika, 66, 403-411. · Zbl 0428.62078
[32] Setodji, C. M. and Cook, R. D. (2004). K-means inverse regression., Technometrics, 46, 421-429.
[33] Tchetgen, E. J. T. (2014). A general regression framework for a secondary outcome in case – control studies., Biostatistics, 15, 117-128.
[34] Wei, J., Carroll, R. J., Müller, U. U., Van Keilegom, I., and Chatterjee, N. (2013). Robust estimation for homoscedastic regression in the secondary analysis of case – control data., Journal of the Royal Statistical Society, Series B, 75, 185-206.
[35] Xia, Y. (2007). A constructive approach to the estimation of dimension reduction directions., Annals of Statistics, pages 2654-2690. · Zbl 1360.62196
[36] Yin, X. and Bura, E. (2006). Moment-based dimension reduction for multivariate response regression., Journal of Statistical Planning and Inference, 136, 3675-3688. · Zbl 1093.62058
[37] Yin, X. and Cook, R. D. (2002). Dimension reduction for the conditional kth moment in regression., Journal of the Royal Statistical Society: Series B, 64, 159-175. · Zbl 1067.62042
[38] Zhu, L., Wang, T., Zhu, L., and Ferré, L. (2010). Sufficient dimension reduction through discretization-expectation estimation., Biometrika, 97, 295-304. · Zbl 1205.62048
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.