×

Bi-level feature selection in high dimensional AFT models with applications to a genomic study. (English) Zbl 1445.92177

Summary: We propose a new bi-level feature selection method for high dimensional accelerated failure time models by formulating the models to a single index model. The method yields sparse solutions at both the group and individual feature levels along with an expedient algorithm, which is computationally efficient and easily implemented. We analyze a genomic dataset for an illustration, and present a simulation study to show the finite sample performance of the proposed method.

MSC:

92D10 Genetics and epigenetics
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

PANTHER
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bednarski, A. E., S. C. Elgin and H. B. Pakrasi (2005). “An inquiry into protein structure and genetic disease: introducing undergraduates to bioinformatics in a large introductory course,” Cell Biol. Educ., 4, 207-220.; Bednarski, A. E.; Elgin, S. C.; Pakrasi, H. B., An inquiry into protein structure and genetic disease: introducing undergraduates to bioinformatics in a large introductory course, Cell Biol. Educ., 4, 207-220 (2005)
[2] Breheny, P. (2015). “The group exponential lasso for bi-level variable selection,” Biometrics, 71, 731-740.; Breheny, P., The group exponential lasso for bi-level variable selection, Biometrics, 71, 731-740 (2015) · Zbl 1419.62316
[3] Breheny, P. and J. Huang (2009). “Penalized methods for bi-level variable selection,” Stat. Its Interface, 2, 369-380.; Breheny, P.; Huang, J., Penalized methods for bi-level variable selection, Stat. Its Interface, 2, 369-380 (2009) · Zbl 1245.62034
[4] Buckley, J. and I. James (1979). “Linear regression with censored data,” Biometrika, 66, 429-436.; Buckley, J.; James, I., Linear regression with censored data, Biometrika, 66, 429-436 (1979) · Zbl 0425.62051
[5] Carroll, R. J., J. Fan, I. Gijbels and M. P. Wand (1997). “Generalized partially linear single-index models,” J. Am. Stat. Assoc., 92, 477-489.; Carroll, R. J.; Fan, J.; Gijbels, I.; Wand, M. P., Generalized partially linear single-index models, J. Am. Stat. Assoc., 92, 477-489 (1997) · Zbl 0890.62053
[6] Fan, J. and R. Li (2001) . “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Am. Stat. Assoc., 96, 1348-1360.; Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 96, 1348-1360 (2001) · Zbl 1073.62547
[7] Fei, F., J. Qu, M. Zhang, Y. Li and S. Zhang (2017). “S100A4 in cancer progression and metastasis: a systematic review,” Oncotarget, 8, 73219.; Fei, F.; Qu, J.; Zhang, M.; Li, Y.; Zhang, S., S100A4 in cancer progression and metastasis: a systematic review, Oncotarget, 8, 73219 (2017)
[8] Flanagan, J. M., J. M. Funes, S. Henderson, L. Wild, N. Carey and C. Boshoff (2009). “Genomics screen in transformed stem cells reveals RNASEH2A, PPAP2C, and ADARB1 as putative anticancer drug targets,” Mol. Cancer Ther., 8, 249-260.; Flanagan, J. M.; Funes, J. M.; Henderson, S.; Wild, L.; Carey, N.; Boshoff, C., Genomics screen in transformed stem cells reveals RNASEH2A, PPAP2C, and ADARB1 as putative anticancer drug targets, Mol. Cancer Ther., 8, 249-260 (2009)
[9] Gui, J. and H. Li (2005). “Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data,” Bioinformatics, 21, 3001-3008.; Gui, J.; Li, H., Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data, Bioinformatics, 21, 3001-3008 (2005)
[10] Huang, J., P. Breheny and S. Ma (2012). “A selective review of group selection in high-dimensional models, Stat. Sci., 27, 481-499.<pub-id pub-id-type=”ThomsonISI“>http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000313757900004&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=b7bc2757938ac7a7a821505f8243d9f3<pub-id pub-id-type=”doi“>10.1214/12-STS392; <element-citation publication-type=”journal“ publication-format=”print”> Huang, J.Breheny, P.Ma, S.2012A selective review of group selection in high-dimensional modelsStat. Sci.27481499 · Zbl 1331.62347
[11] Huang, J., S. Ma, H. Xie and C.-H. Zhang (2009). “A group bridge approach for variable selection,” Biometrika, 96, 339-355.; Huang, J.; Ma, S.; Xie, H.; Zhang, C.-H., A group bridge approach for variable selection, Biometrika, 96, 339-355 (2009) · Zbl 1163.62050
[12] Ichimura, H. (1993). “Semiparametric least squares (SLS) and weighted SLS estimation of single-index models,” J. Econom., 58, 71-120.; Ichimura, H., Semiparametric least squares (SLS) and weighted SLS estimation of single-index models, J. Econom., 58, 71-120 (1993) · Zbl 0816.62079
[13] Lee, K. H., S. Chakraborty and J. Sun (2017). “Variable selection for high-dimensional genomic data with censored outcomes using group lasso prior,” Comput. Stat. Data Anal., 112, 1-13.; Lee, K. H.; Chakraborty, S.; Sun, J., Variable selection for high-dimensional genomic data with censored outcomes using group lasso prior, Comput. Stat. Data Anal, 112, 1-13 (2017) · Zbl 1464.62115
[14] Lek, M., K. J. Karczewski, E. V. Minikel, K. E. Samocha, E. Banks, T. Fennell, A. H. O’Donnell-Luria, J. S. Ware, A. J. Hill, B. B. Cummings, T. Tukiainen, D. P. Birnbaum, J. A. Kosmicki, L. E. Duncan, K. Estrada, F. Zhao, J. Zou, E. Pierce-Hoffman, J. Berghout, D. N. Cooper, N. Deflaux, M. DePristo, R. Do, J. Flannick, M. Fromer, L. Gauthier, J. Goldstein, N. Gupta, D. Howrigan, A. Kiezun, M. I. Kurki, A. L. Moonshine, P. Natarajan, L. Orozco, G. M. Peloso, R. Poplin, M. A. Rivas, V. Ruano-Rubio, S. A. Rose, D. M. Ruderfer, K. Shakir, P. D. Stenson, C. Stevens, B. P. Thomas, G. Tiao, M. T. Tusie-Luna, B. Weisburd, H.-H. Won, D. Yu, D. M. Altshuler, D. Ardissino, M. Boehnke, J. Danesh, S. Donnelly, R. Elosua, J. C. Florez, S. B. Gabriel, G. Getz, S. J. Glatt, C. M. Hultman, S. Kathiresan, M. Laakso, S. McCarroll, M. I. McCarthy, D. McGovern, R. McPherson, B. M. Neale, A. Palotie, S. M. Purcell, D. Saleheen, J. M. Scharf, P. Sklar, P. F. Sullivan, J. Tuomilehto, M. T. Tsuang, H. C. Watkins, J. G. Wilson, M. J. Daly, D. G. MacArthur and Exome Aggregation Consortium (2016). “Analysis of protein-coding genetic variation in 60,706 humans,” Nature, 536, 285-291.; Lek, M.; Karczewski, K. J.; Minikel, E. V.; Samocha, K. E.; Banks, E.; Fennell, T.; O’Donnell-Luria, A. H.; Ware, J. S.; Hill, A. J.; Cummings, B. B.; Tukiainen, T.; Birnbaum, D. P.; Kosmicki, J. A.; Duncan, L. E.; Estrada, K.; Zhao, F.; Zou, J.; Pierce-Hoffman, E.; Berghout, J.; Cooper, D. N.; Deflaux, N.; DePristo, M.; Do, R.; Flannick, J.; Fromer, M.; Gauthier, L.; Goldstein, J.; Gupta, N.; Howrigan, D.; Kiezun, A.; Kurki, M. I.; Moonshine, A. L.; Natarajan, P.; Orozco, L.; Peloso, G. M.; Poplin, R.; Rivas, M. A.; Ruano-Rubio, V.; Rose, S. A.; Ruderfer, D. M.; Shakir, K.; Stenson, P. D.; Stevens, C.; Thomas, B. P.; Tiao, G.; Tusie-Luna, M. T.; Weisburd, B.; Won, H.-H.; Yu, D.; Altshuler, D. M.; Ardissino, D.; Boehnke, M.; Danesh, J.; Donnelly, S.; Elosua, R.; Florez, J. C.; Gabriel, S. B.; Getz, G.; Glatt, S. J.; Hultman, C. M.; Kathiresan, S.; Laakso, M.; McCarroll, S.; McCarthy, M. I.; McGovern, D.; McPherson, R.; Neale, B. M.; Palotie, A.; Purcell, S. M.; Saleheen, D.; Scharf, J. M.; Sklar, P.; Sullivan, P. F.; Tuomilehto, J.; Tsuang, M. T.; Watkins, H. C.; Wilson, J. G.; Daly, M. J.; MacArthur, D. G.; , Analysis of protein-coding genetic variation in 60,706 humans, Nature, 536, 285-291 (2016)
[15] Liang, H., X. Liu, R. Li and C. L. Tsai (2010). “Estimation and testing for partially linear single-index models,” Ann. Stat., 38, 3811-3836.; Liang, H.; Liu, X.; Li, R.; Tsai, C. L., Estimation and testing for partially linear single-index models, Ann. Stat., 38, 3811-3836 (2010) · Zbl 1204.62068
[16] Liu, J., J. Huang, Y. Zhang, Q. Lan, N. Rothman, T. Zheng and S. Ma (2013). “Identification of gene-environment interactions in cancer studies using penalization,” Genomics, 102, 189-194.; Liu, J.; Huang, J.; Zhang, Y.; Lan, Q.; Rothman, N.; Zheng, T.; Ma, S., Identification of gene-environment interactions in cancer studies using penalization, Genomics, 102, 189-194 (2013)
[17] Magger, O., Y. Y. Waldman, E. Ruppin and R. Sharan (2012). “Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks,” PLoS Comput. Biol., 8, e1002690.; Magger, O.; Waldman, Y. Y.; Ruppin, E.; Sharan, R., Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks, PLoS Comput. Biol., 8, e1002690 (2012)
[18] Mi, H., A. Muruganujan, J. T. Casagrande and P. D. Thomas (2013). “Large-scale gene function analysis with the panther classification system,” Nat. Protoc., 8, 1551.; Mi, H.; Muruganujan, A.; Casagrande, J. T.; Thomas, P. D., Large-scale gene function analysis with the panther classification system, Nat. Protoc., 8, 1551 (2013)
[19] Quan, M., J.-J. Cui, X. Feng and Q. Huang (2017). “The critical role and potential target of the autotaxin/lysophosphatidate axis in pancreatic cancer,” Tumor Biol., 39, 1010428317694544.; Quan, M.; Cui, J.-J.; Feng, X.; Huang, Q., The critical role and potential target of the autotaxin/lysophosphatidate axis in pancreatic cancer, Tumor Biol., 39 (2017)
[20] Rangaswami, H., A. Bulbule and G. C. Kundu (2006). “Osteopontin: role in cell signaling and cancer progression,” Trends Cell Biol., 16, 79-87.; Rangaswami, H.; Bulbule, A.; Kundu, G. C., Osteopontin: role in cell signaling and cancer progression, Trends Cell Biol., 16, 79-87 (2006)
[21] Tibshirani, R. (1996). “Regression shrinkage and selection via the lasso,” J. Royal Stat. Soc. B, 58, 267-288.; Tibshirani, R., Regression shrinkage and selection via the lasso, J. Royal Stat. Soc. B, 58, 267-288 (1996) · Zbl 0850.62538
[22] Timpson, N. J., C. M. Greenwood, N. Soranzo, D. J. Lawson and J. B. Richards (2018) . “Genetic architecture: the shape of the genetic contribution to human traits and disease,” Nat. Rev. Genet., 19, 110-124.; Timpson, N. J.; Greenwood, C. M.; Soranzo, N.; Lawson, D. J.; Richards, J. B., Genetic architecture: the shape of the genetic contribution to human traits and disease, Nat. Rev. Genet., 19, 110-124 (2018)
[23] Trevino, V., F. Falciani and H. A. Barrera-Saldaña (2007). “DNA microarrays: a powerful genomic tool for biomedical and clinical research,” Mol. Med., 13, 527-541.; Trevino, V.; Falciani, F.; Barrera-Saldaña, H. A., DNA microarrays: a powerful genomic tool for biomedical and clinical research, Mol. Med., 13, 527-541 (2007)
[24] Wang, H., S. Lee, C. L. Nigro, L. Lattanzio, M. Merlano, M. Monteverde, R. Matin, K. Purdie, N. Mladkova, D. Bergamaschi, C. Harwood, N. Syed, P. Szlosarek, E. Briasoulis, A. McHugh, A. Thompson, A. Evans, I. Leigh, C. Fleming, G. J. Inman, E. Hatzimichael, C. Proby, T. Crook (2012). “NT5E (CD73) is epigenetically regulated in malignant melanoma and associated with metastatic site specificity,” Br. J. Cancer, 106, 1446.; Wang, H.; Lee, S.; Nigro, C. L.; Lattanzio, L.; Merlano, M.; Monteverde, M.; Matin, R.; Purdie, K.; Mladkova, N.; Bergamaschi, D.; Harwood, C.; Syed, N.; Szlosarek, P.; Briasoulis, E.; McHugh, A.; Thompson, A.; Evans, A.; Leigh, I.; Fleming, C.; Inman, G. J.; Hatzimichael, E.; Proby, C.; Crook, T., NT5E (CD73) is epigenetically regulated in malignant melanoma and associated with metastatic site specificity, Br. J. Cancer, 106, 1446 (2012)
[25] Wang, L., G. Chen and H. Li (2007). “Group SCAD regression analysis for microarray time course gene expression data,” Bioinformatics, 23, 1486-1494.; Wang, L.; Chen, G.; Li, H., Group SCAD regression analysis for microarray time course gene expression data, Bioinformatics, 23, 1486-1494 (2007)
[26] Wang, T., P.-R. Xu and L.-X. Zhu (2012). “Non-convex penalized estimation in high-dimensional models with single-index structure,” J. Multivariate Anal., 109, 221-235.; Wang, T.; Xu, P.-R.; Zhu, L.-X., Non-convex penalized estimation in high-dimensional models with single-index structure, J. Multivariate Anal., 109, 221-235 (2012) · Zbl 1241.62097
[27] Wang, Z. and C. Wang (2010). “Buckley-James boosting for survival analysis with high-dimensional biomarker data,” Stat. Appl. Genet. Mol. Biol., 9, 24.; Wang, Z.; Wang, C., Buckley-James boosting for survival analysis with high-dimensional biomarker data, Stat. Appl. Genet. Mol. Biol., 9, 24 (2010) · Zbl 1304.92101
[28] Wei, L.-J. (1992). “The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis,” Stat. Med., 11, 1871-1879.; Wei, L.-J., The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis, Stat. Med., 11, 1871-1879 (1992)
[29] Witten, D. M. and R. Tibshirani (2010). “Survival analysis with high-dimensional covariates,” Stat. Methods Med. Res., 19, 29-51.; Witten, D. M.; Tibshirani, R., Survival analysis with high-dimensional covariates, Stat. Methods Med. Res., 19, 29-51 (2010)
[30] Wu, J., W. Du, X. Wang, L. Wei, Y. Pan, X. Wu, J. Zhang and D. Pei (2018). “Ras-related protein Rap2c promotes the migration and invasion of human osteosarcoma cells,” Oncol. Lett., 15, 5352-5358.; Wu, J.; Du, W.; Wang, X.; Wei, L.; Pan, Y.; Wu, X.; Zhang, J.; Pei, D., Ras-related protein Rap2c promotes the migration and invasion of human osteosarcoma cells, Oncol. Lett., 15, 5352-5358 (2018)
[31] Xu, L., S. S. Shen, Y. Hoshida, A. Subramanian, K. Ross, J.-P. Brunet, S. N. Wagner, S. Ramaswamy, J. P. Mesirov and R. O. Hynes (2008). “Gene expression changes in an animal melanoma model correlate with aggressiveness of human melanoma metastases,” Mol. Cancer Res., 6, 760-769.; Xu, L.; Shen, S. S.; Hoshida, Y.; Subramanian, A.; Ross, K.; Brunet, J.-P.; Wagner, S. N.; Ramaswamy, S.; Mesirov, J. P.; Hynes, R. O., Gene expression changes in an animal melanoma model correlate with aggressiveness of human melanoma metastases, Mol. Cancer Res., 6, 760-769 (2008)
[32] Yuan, M. and Y. Lin (2006). “Model selection and estimation in regression with grouped variables,” J. Royal Stat. Soc. B, 68, 49-67.; Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, J. Royal Stat. Soc. B, 68, 49-67 (2006) · Zbl 1141.62030
[33] Zeng, B., X. M. Wen and L. Zhu (2017). “A link-free sparse group variable selection method for single-index model,” J. Appl. Stat., 44, 2388-2400.; Zeng, B.; Wen, X. M.; Zhu, L., A link-free sparse group variable selection method for single-index model, J. Appl. Stat., 44, 2388-2400 (2017) · Zbl 1516.62696
[34] Zhang, C.-H. (2010). “Nearly unbiased variable selection under minimax concave penalty,” Ann. Stat., 38, 894-942.; Zhang, C.-H., Nearly unbiased variable selection under minimax concave penalty, Ann. Stat., 38, 894-942 (2010) · Zbl 1183.62120
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.