zbMATH — the first resource for mathematics

Function-on-scalar quantile regression with application to mass spectrometry proteomics data. (English) Zbl 1446.62355
Summary: Mass spectrometry proteomics, characterized by spiky, spatially heterogeneous functional data, can be used to identify potential cancer biomarkers. Existing mass spectrometry analyses utilize mean regression to detect spectral regions that are differentially expressed across groups. However, given the interpatient heterogeneity that is a key hallmark of cancer, many biomarkers are only present at aberrant levels for a subset of, not all, cancer samples. Differences in these biomarkers can easily be missed by mean regression but might be more easily detected by quantile-based approaches. Thus, we propose a unified Bayesian framework to perform quantile regression on functional responses. Our approach utilizes an asymmetric Laplace working likelihood, represents the functional coefficients with basis representations which enable borrowing of strength from nearby locations and places a global-local shrinkage prior on the basis coefficients to achieve adaptive regularization. Different types of basis transform and continuous shrinkage priors can be used in our framework. A scalable Gibbs sampler is developed to generate posterior samples that can be used to perform Bayesian estimation and inference while accounting for multiple testing. Our framework performs quantile regression and coefficient regularization in a unified manner, allowing them to inform each other and leading to improvement in performance over competing methods, as demonstrated by simulation studies. We also introduce an adjustment procedure to the model to improve its frequentist properties of posterior inference. We apply our model to identify proteomic biomarkers of pancreatic cancer that are differentially expressed for a subset of cancer patients compared to the normal controls which were missed by previous mean-regression based approaches. Supplementary Material for this article is available online.
62R10 Functional data analysis
62G08 Nonparametric regression and quantile regression
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI Euclid
[1] Baggerly, K. A., Morris, J. S. and Coombes, K. R. (2004). Reproducibility of SELDI-TOF protein patterns in serum: Comparing datasets from different experiments. Bioinformatics 20 777-785.
[2] Baggerly, K. A., Morris, J. S., Wang, J., Gold, D., Xiao, L.-C. and Coombes, K. R. (2003). A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples. Proteomics 3 1667-1672.
[3] Bhattacharya, A., Pati, D., Pillai, N. S. and Dunson, D. B. (2015). Dirichlet-Laplace priors for optimal shrinkage. J. Amer. Statist. Assoc. 110 1479-1490. · Zbl 1373.62368
[4] Brockhaus, S. and Ruegamer, D. (2017). FDboost: Boosting functional regression models.
[5] Brockhaus, S., Scheipl, F., Hothorn, T. and Greven, S. (2015). The functional linear array model. Stat. Model. 15 279-300.
[6] Cai, Z. and Xu, X. (2008). Nonparametric quantile estimations for dynamic smooth coefficient models. J. Amer. Statist. Assoc. 103 1595-1608. · Zbl 1286.62029
[7] Cardot, H., Crambes, C. and Sarda, P. (2005). Quantile regression when the covariates are functions. J. Nonparametr. Stat. 17 841-856. · Zbl 1077.62026
[8] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2009). Handling sparsity via the horseshoe. In Artificial Intelligence and Statistics 73-80.
[9] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97 465-480. · Zbl 1406.62021
[10] Chen, K. and Müller, H.-G. (2012). Conditional quantile analysis when covariates are functions, with application to growth data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 67-89. · Zbl 1411.62095
[11] Coombes, K. R., Tsavachidis, S., Morris, J. S., Baggerly, K. A., Hung, M.-C. and Kuerer, H. M. (2005). Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics 5 4107-4117.
[12] Deutsch, E. W., Lam, H. and Aebersold, R. (2008). Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. Physiol. Genomics 33 18-25.
[13] Fasiolo, M., Goude, Y., Nedellec, R. and Wood, S. N. (2018). Fast calibrated additive quantile regression. Preprint. Available at arXiv:1707.03307.
[14] Feng, X. and Zhu, L. (2016). Estimation and testing of varying coefficients in quantile regression. J. Amer. Statist. Assoc. 111 266-274.
[15] Ferraty, F., Rabhi, A. and Vieu, P. (2005). Conditional quantiles for dependent functional data with application to the climatic El Niño phenomenon. Sankhyā 67 378-398. · Zbl 1192.62104
[16] Gasteiger, E., Hoogland, C., Gattiker, A., Wilkins, M. R., Appel, R. D., Bairoch, A. et al. (2005). Protein identification and analysis tools on the ExPASy server. In The Proteomics Protocols Handbook 571-607. Springer, Berlin.
[17] Geraci, M. and Bottai, M. (2006). Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8 140-154. · Zbl 1170.62380
[18] Guo, J.-C., Li, J., Zhou, L., Yang, J.-Y., Zhang, Z.-G., Liang, Z.-Y., Zhou, W.-X., You, L., Zhang, T.-P. et al. (2016). CXCL12-CXCR7 axis contributes to the invasive phenotype of pancreatic cancer. Oncotarget 7 62006-62018.
[19] Innocenti, F., Owzar, K., Cox, N. L., Evans, P., Kubo, M., Zembutsu, H., Jiang, C., Hollis, D., Mushiroda, T. et al. (2012). A genome-wide association study of overall survival in pancreatic cancer patients treated with gemcitabine in CALGB 80303. Clin. Cancer Res. 18 577-584.
[20] James, G. M., Wang, J. and Zhu, J. (2009). Functional linear regression that’s interpretable. Ann. Statist. 37 2083-2108. · Zbl 1171.62041
[21] Kato, K. (2012). Estimation in functional linear quantile regression. Ann. Statist. 40 3108-3136. · Zbl 1296.62104
[22] Kim, M.-O. (2007). Quantile regression with varying coefficients. Ann. Statist. 35 92-108. · Zbl 1114.62051
[23] Kinter, M. and Sherman, N. E. (2005). Protein Sequencing and Identification Using Tandem Mass Spectrometry 9. Wiley, New York.
[24] Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38. Cambridge Univ. Press, Cambridge.
[25] Koenker, R. (2017). quantreg: Quantile regression. R package version 5.33.
[26] Koenker, R. and Bassett, G. Jr. (1978). Regression quantiles. Econometrica 46 33-50. · Zbl 0373.62038
[27] Koomen, J. M., Shih, L. N., Coombes, K. R., Li, D., Xiao, L., Fidler, I. J., Abbruzzese, J. L. and Kobayashi, R. (2005). Plasma protein profiling for diagnosis of pancreatic cancer reveals the presence of host response proteins. Clin. Cancer Res. 11 1110-1118.
[28] Li, M., Wang, K., Maity, A. and Staicu, A.-M. (2016). Inference in functional linear quantile regression. Preprint. Available at arXiv:1602.08793.
[29] Liao, H., Moschidis, E., Riba-Garcia, I., Zhang, Y., Unwin, R. D., Morris, J. S., Graham, J. and Dowsey, A. W. (2014). A new paradigm for clinical biomarker discovery and screening with mass spectrometry through biomedical image analysis principles. In 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI) 1332-1335. IEEE, Piscataway, NJ.
[30] Liu, Y., Li, M. and Morris, J. S. (2020). Supplement to “Function-on-scalar quantile regression with application to mass spectrometry proteomics data.” https://doi.org/10.1214/19-AOAS1319SUPPA, https://doi.org/10.1214/19-AOAS1319SUPPB.
[31] Lum, K. and Gelfand, A. E. (2012). Spatial quantile multiple regression using the asymmetric Laplace process. Bayesian Anal. 7 235-258. · Zbl 1330.62197
[32] MATLAB (2016). Version 9.1 (R2016b). The MathWorks Inc., Natick, MA.
[33] Meyer, M. J., Coull, B. A., Versace, F., Cinciripini, P. and Morris, J. S. (2015). Bayesian function-on-function regression for multilevel functional data. Biometrics 71 563-574. · Zbl 1419.62408
[34] Morris, J. S. (2012). Statistical methods for proteomic biomarker discovery based on feature extraction or functional modeling approaches. Stat. Interface 5 117-135. · Zbl 1245.62154
[35] Morris, J. S. (2015). Functional regression. Annu. Rev. Stat. Appl. 2 321-359.
[36] Morris, J. S. (2017). Comparison and contrast of two general functional regression modelling frameworks [Discussion of MR3619335]. Stat. Model. 17 59-85.
[37] Morris, J. S. and Carroll, R. J. (2006). Wavelet-based functional mixed models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 179-199. · Zbl 1110.62053
[38] Morris, J. S., Brown, P. J., Herrick, R. C., Baggerly, K. A. and Coombes, K. R. (2008). Bayesian analysis of mass spectrometry proteomic data using wavelet-based functional mixed models. Biometrics 64 479-489, 667. · Zbl 1137.62399
[39] Polson, N. G. and Scott, J. G. (2011). Shrink globally, act locally: Sparse Bayesian regularization and prediction. In Bayesian Statistics 9 501-538. Oxford Univ. Press, Oxford.
[40] Reed, C. and Yu, K. (2009). A partially collapsed Gibbs sampler for Bayesian quantile regression.
[41] Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
[42] Sorace, J. M. and Zhan, M. (2003). A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinform. 4 Art. ID 24.
[43] Sriram, K. (2015). A sandwich likelihood correction for Bayesian quantile regression based on the misspecified asymmetric Laplace density. Statist. Probab. Lett. 107 18-26. · Zbl 1357.62132
[44] Syring, N. and Martin, R. (2019). Calibrating general posterior credible regions. Biometrika 106 479-486. · Zbl 1454.62105
[45] R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
[46] van der Pas, S. L., Kleijn, B. J. K. and van der Vaart, A. W. (2014). The horseshoe estimator: Posterior concentration around nearly black vectors. Electron. J. Stat. 8 2585-2618. · Zbl 1309.62060
[47] Wang, H. J., Zhu, Z. and Zhou, J. (2009). Quantile regression in partially linear varying coefficient models. Ann. Statist. 37 3841-3866. · Zbl 1191.62077
[48] Xi, R., Li, Y. and Hu, Y. (2016). Bayesian quantile regression based on the empirical likelihood with spike and slab priors. Bayesian Anal. 11 821-855. · Zbl 1357.62181
[49] Yang, Y. and He, X. (2012). Bayesian empirical likelihood for quantile regression. Ann. Statist. 40 1102-1131. · Zbl 1274.62458
[50] Yang, Y., Wang, H. J. and He, X. (2016). Posterior inference in Bayesian quantile regression with asymmetric Laplace likelihood. Int. Stat. Rev. 84 327-344.
[51] Yee, N. S., Chan, A. S., Yee, J. D. and Yee, R. K. (2012). TRPM7 and TRPM8 ion channels in pancreatic adenocarcinoma: Potential roles as cancer biomarkers and targets. Scientifica 2012 Art. ID 415158.
[52] Yu, K. and Moyeed, R. A. (2001). Bayesian quantile regression. Statist. Probab. Lett. 54 437-447. · Zbl 0983.62017
[53] Yue, Y. R. and Rue, H. (2011). Bayesian inference for additive mixed quantile regression models. Comput. Statist. Data Anal. 55 84-96. · Zbl 1247.62101
[54] Zhang, J.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.