Logistic regression analysis with standardized markers. (English) Zbl 1454.62344

Summary: Two different approaches to analysis of data from diagnostic biomarker studies are commonly employed. Logistic regression is used to fit models for probability of disease given marker values, while ROC curves and risk distributions are used to evaluate classification performance. In this paper we present a method that simultaneously accomplishes both tasks. The key step is to standardize markers relative to the nondiseased population before including them in the logistic regression model. Among the advantages of this method are the following: (i) ensuring that results from regression and performance assessments are consistent with each other; (ii) allowing covariate adjustment and covariate effects on ROC curves to be handled in a familiar way, and (iii) providing a mechanism to incorporate important assumptions about structure in the ROC curve into the fitted risk model. We develop the method in detail for the problem of combining biomarker data sets derived from multiple studies, populations or biomarker measurement platforms, when ROC curves are similar across data sources. The methods are applicable to both cohort and case-control sampling designs. The data set motivating this application concerns Prostate Cancer Antigen 3 (PCA3) for diagnosis of prostate cancer in patients with or without previous negative biopsy where the ROC curves for PCA3 are found to be the same in the two populations. The estimated constrained maximum likelihood and empirical likelihood estimators are derived. The estimators are compared in simulation studies and the methods are illustrated with the PCA3 data set.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62G05 Nonparametric estimation
62J12 Generalized linear models (logistic models)
Full Text: DOI arXiv Euclid


[1] Alonzo, T. A. and Pepe, M. S. (2002). Distribution-free ROC analysis using binary regression techniques. Biostatistics 3 421-432. · Zbl 1135.62390
[2] Bura, E. and Gastwirth, J. L. (2001). The binary regression quantile plot: Assessing the importance of predictors in binary regression visually. Biom. J. 43 5-21. · Zbl 0997.62053
[3] Cai, T. and Zheng, Y. (2007). Model checking for ROC regression analysis. Biometrics 63 152-163, 312-313. · Zbl 1123.62080
[4] Campbell, G. and Ratnaparkhi, M. V. (1993). An application of Lomax distributions in receiver operating characteristic (ROC) curve analysis. Communications in Statistics 22 1681-1697. · Zbl 0800.62721
[5] Deras, I. L., Aubin, S. M. J., Blase, A., Day, J. R., Koo, S., Partin, A. W., Ellis, W. J., Marks, L. S., Fradet, Y., Rittenhouse, H. and Groskopf, J. (2008). PCA3: A molecular urine assay for predicting prostate biopsy outcome. J. Urol. 179 1587-1592.
[6] Dodd, L. E. and Pepe, M. S. (2003). Semiparametric regression for the area under the receiver operating characteristic curve. J. Amer. Statist. Assoc. 98 409-417. · Zbl 1041.62087
[7] Dorfman, D. D., Berbaum, K. S., Metz, C. E., Length, R. V., Hanley, J. A. and Dagga, H. A. (1996). Proper receiver operating characteristic analysis: The bigamma model. Academic Radiology 4 138-149.
[8] Egan, J. P. (1975). Signal Detection Theory and ROC Analysis . Academic Press, New York.
[9] Frischancho, A. R. (1990). Anthropometric Standards for the Assessment of Growth and Nutritional Status . Univ. Michigan Press, Ann Arbor.
[10] Gu, W. and Pepe, M. S. (2010). Estimating the diagnostic likelihood ratio of a continuous marker. Biostatistics 12 87-101.
[11] Hanley, J. A. and Hajian-Tilaki, K. O. (1997). Sampling variability of nonparametric estimate of the areas under receiver operating characteristic curves: An update. Academic Radiology 4 49-58.
[12] Hosmer, D. W. and Lemeshow, S. (1980). Goodness of fit tests for the multiple logistic regression model. Comm. Statist. Theory Methods 9 1043-1069. · Zbl 0447.62025
[13] Huang, Y. (2007). Evaluating the predictiveness of continuous biomarkers. Ph.D. thesis, Univ. Washington. · Zbl 1136.62078
[14] Huang, Y., Pepe, M. S. and Feng, Z. (2007). Evaluating the predictiveness of a continuous marker. Biometrics 63 1181-1188, 1313. · Zbl 1136.62078
[15] Huang, Y. and Pepe, M. S. (2009a). Biomarker evaluation using the controls as a reference population. Biostatistics 10 228-244.
[16] Huang, Y. and Pepe, M. S. (2009b). A parametric ROC model-based approach for evaluating the predictiveness of continuous markers in case-control studies. Biometrics 65 1133-1144. · Zbl 1181.62181
[17] Huang, Y. and Pepe, M. S. (2009c). Semiparametric methods for evaluating risk prediction markers in case-control studies. Biometrika 96 991-997. · Zbl 1178.62119
[18] Huang, Y. and Pepe, M. S. (2010a). Semiparametric methods for evaluating the covariate-specific predictiveness of continuous markers in matched case-control studies. J. R. Stat. Soc. Ser. C. Appl. Stat. 59 437-456.
[19] Huang, Y. and Pepe, M. S. (2010b). Assessing risk prediction models in case-control studies using semiparametric and nonparametric methods. Stat. Med. 29 1391-1410.
[20] Huang, Y., Pepe, M. S. and Feng, Z. (2013). Supplement to “Logistic regression analysis with standardized markers.” . · Zbl 1454.62344
[21] Janes, H. and Pepe, M. S. (2008). Adjusting for covariates in studies of diagnostic, screening, or prognostic markers: An old concept in a new setting. Am. J. Epidemiol. 168 89-97.
[22] Janes, H. and Pepe, M. S. (2009). Adjusting for covariate effects on classification accuracy using the covariate-adjusted receiver operating characteristic curve. Biometrika 96 371-382. · Zbl 1163.62046
[23] Metz, C. E. and Pan, X. (1999). “Proper” binormal ROC curves: Theory and maximum-likelihood estimation. J. Math. Psych. 43 1-33. · Zbl 0920.62138
[24] Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford Statistical Science Series 28 . Oxford Univ. Press, Oxford. · Zbl 1039.62105
[25] Pepe, M. S. and Cai, T. (2004). The analysis of placement values for evaluating discriminatory measures. Biometrics 60 528-535. · Zbl 1274.62173
[26] Pepe, M. S., Etzioni, R., Feng, Z., Potter, J. D., Thompson, M. L., Thornquist, M., Winget, M. and Yasui, Y. (2001). Phases of biomarker development for early detection of cancer. J. Natl. Cancer Inst. 93 1054-1061.
[27] Pepe, M. S., Feng, Z., Huang, Y., Longton, G. M., Prentice, R., Thompson, I. M. and Zheng, Y. (2008). Integrating the predictiveness of a marker with its performance as a classifier. American Journal of Epidemiology 167 362-368.
[28] Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations. Ann. Statist. 22 300-325. · Zbl 0799.62049
[29] Qin, J. and Zhang, B. (1997). A goodness-of-fit test for logistic regression models based on case-control data. Biometrika 84 609-618. · Zbl 0888.62045
[30] Qin, J. and Zhang, B. (2003). Using logistic regression procedures for estimating receiver operating characteristic curves. Biometrika 90 585-596. · Zbl 1436.62620
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.