Leday, Gwenaël G. R.; van der Vaart, Aad W.; van Wieringen, Wessel N.; van de Wiel, Mark A. Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines. (English) Zbl 1288.62161 Ann. Appl. Stat. 7, No. 2, 823-845 (2013). Summary: DNA copy numbers and mRNA expressions are widely used data types in cancer studies, which combined provide more insight than separately. Whereas in existing literature the form of the relationship between these two types of markers is fixed a priori, we model their associations. We employ piecewise linear regression splines (PLRS), which combine good interpretation with sufficient flexibility to identify any plausible type of relationships. The specification of the model leads to estimation and model selection in a constrained, nonstandard setting. We provide methodology for testing the effects of DNA on mRNA and choosing an appropriate model. Furthermore, we present a novel approach to obtain reliable confidence bands for constrained PLRS, which incorporates model uncertainty. The procedures are applied to colorectal and breast cancer data. Common assumptions are found to be potentially misleading for biologically relevant genes. More flexible models may bring more insight in the interaction between the two markers. Cited in 2 Documents MSC: 62P10 Applications of statistics to biology and medical sciences; meta analysis 62J05 Linear regression; mixed models 92C50 Medical applications (general) 62F30 Parametric inference under constraints 65C60 Computational problems in statistics (MSC2010) Software:DR-Integrator; ic.infer; CGHcall PDFBibTeX XMLCite \textit{G. G. R. Leday} et al., Ann. Appl. Stat. 7, No. 2, 823--845 (2013; Zbl 1288.62161) Full Text: DOI arXiv Euclid References: [1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory ( Tsahkadsor , 1971) (B. N. Petrov and F. Csaki, eds.) 267-281. Akadémiai Kiadó, Budapest. · Zbl 0283.62006 [2] Andrews, D. W. K. (2000). Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica 68 399-405. · Zbl 1015.62044 · doi:10.1111/1468-0262.00114 [3] Arnold, B. C. and Shavelle, R. M. (1998). Joint confidence sets for the mean and variance of a normal distribution. Amer. Statist. 52 133-140. [4] Asimit, J. L., Andrulis, I. L. and Bull, S. B. (2011). Regression models, scan statistics and reappearance probabilities to detect regions of association between gene expression and copy number. Stat. Med. 30 1157-1178. · doi:10.1002/sim.4193 [5] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. · Zbl 0809.62014 [6] Bicciato, S., Spinelli, R., Zampieri, M., Mangano, E., Ferrari, F., Beltrame, L., Cifola, I., Peano, C., Solari, A. and Battaglia, C. (2009). A computational procedure to identify significant overlap of differentially expressed and genomic imbalanced regions in cancer datasets. Nucleic Acids Res. 37 5057-5070. [7] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization . Cambridge Univ. Press, Cambridge. · Zbl 1058.90049 [8] Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika 52 345-370. · Zbl 0627.62005 · doi:10.1007/BF02294361 [9] Brown, L. D., Cai, T. T. and DasGupta, A. (2003). Interval estimation in exponential families. Statist. Sinica 13 19-49. · Zbl 1017.62027 [10] Buckland, S. T., Burnham, K. P. and Augustin, N. H. (1997). Model selection: An integral part of inference. Biometrics 53 603-618. · Zbl 0885.62118 · doi:10.2307/2533961 [11] Burnham, K. P. and Anderson, D. R. (2002). Model Selection and Multimodel Inference : A Practical Information-Theoretic Approach , 2nd ed. Springer, New York. · Zbl 1005.62007 · doi:10.1007/b97636 [12] Carvalho, B., Postma, C., Mongera, S., Hopmans, E., Diskin, S., van de Wiel, M. A., van Criekinge, W., Thas, O., Matthäi, A., Cuesta, M. A., Droste, J. S. T. S., Craanen, M., Schröck, E., Ylstra, B. and Meijer, G. A. (2009). Multiple putative oncogenes at the chromosome 20q amplicon contribute to colorectal adenoma to carcinoma progression. Gut 58 79-89. [13] Chernoff, H. (1954). On the distribution of the likelihood ratio. Ann. Math. Statistics 25 573-578. · Zbl 0056.37102 · doi:10.1214/aoms/1177728725 [14] Gouriéroux, C., Holly, A. and Monfort, A. (1982). Likelihood ratio test, Wald test, and Kuhn-Tucker test in linear models with inequality constraints on the regression parameters. Econometrica 50 63-80. · Zbl 0483.62058 · doi:10.2307/1912529 [15] Grömping, U. (2010). Inference with linear equality and inequality constraints using R: The package ic.infer. J. Stat. Softw. 33 1-31. [16] Gu, W., Choi, H. and Ghosh, D. (2008). Global associations between copy number and transcript mRNA microarray data: An empirical study. Cancer Inform. 6 17-23. [17] Hughes, A. W. and King, M. L. (2003). Model selection using AIC in the presence of one-sided information. J. Statist. Plann. Inference 115 397-411. · Zbl 1022.62002 · doi:10.1016/S0378-3758(02)00159-3 [18] Jörnsten, R., Abenius, T., Kling, T., Schmidt, L., Johansson, E., Nordling, T. E. M., Nordlander, B., Sander, C., Gennemark, P., Funa, K., Nilsson, B., Lindahl, L. and Nelander, S. (2011). Network modeling of the transcriptional effects of copy number aberrations in glioblastoma. Mol. Syst. Biol. 7 486. [19] Kodde, D. A. and Palm, F. C. (1986). Wald criteria for jointly testing equality and inequality restrictions. Econometrica 54 1243-1248. · Zbl 0595.62013 · doi:10.2307/1912331 [20] Kudô, A. (1963). A multivariate analogue of the one-sided test. Biometrika 50 403-418. · Zbl 0121.13906 · doi:10.1093/biomet/50.3-4.403 [21] Leday, G. G. R., van der Vaart, A. W., van Wieringen, W. N. and van de Wiel, M. A. (2013). Supplement to “Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines.” . · Zbl 1288.62161 [22] Lee, H., Kong, S. W. and Park, P. J. (2008). Integrative analysis reveals the direct and indirect interactions between DNA copy number aberrations and gene expression changes. Bioinformatics 24 889-896. · Zbl 1279.92064 [23] Lipson, D., Ben-Dor, A., Dehan, E. and Yakhini, Z. (2004). Joint analysis of DNA copy numbers and gene expression levels. In Algorithms in Bioinformatics. Lecture Notes in Computer Science 3240 135-146. Springer, Berlin. · doi:10.1007/978-3-540-30219-3_12 [24] Meeker, W. Q. and Escobar, L. A. (1995). Teaching about approximate confidence regions based on maximum likelihood estimation. Amer. Statist. 49 48-53. [25] Menezes, R., Boetzer, M., Sieswerda, M., van Ommen, G. J. and Boer, J. (2009). Integrated analysis of DNA copy number and gene expression microarray data using gene sets. BMC Bioinformatics 10 203+. [26] Neve, R. M., Chin, K., Fridlyand, J., Yeh, J., Baehner, F. L., Fevr, T., Clark, L., Bayani, N., Coppe, J.-P. P., Tong, F., Speed, T., Spellman, P. T., DeVries, S., Lapuk, A., Wang, N. J., Kuo, W.-L. L., Stilwell, J. L., Pinkel, D., Albertson, D. G., Waldman, F. M., McCormick, F., Dickson, R. B., Johnson, M. D., Lippman, M., Ethier, S., Gazdar, A. and Gray, J. W. (2006). A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell. 10 515-527. [27] Olshen, A. B., Venkatraman, E. S., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5 557-572. · Zbl 1155.62478 · doi:10.1093/biostatistics/kxh008 [28] Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D.-Y., Pollack, J. R. and Wang, P. (2010). Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat. 4 53-77. · Zbl 1189.62174 · doi:10.1214/09-AOAS271 [29] Pinkel, D. and Albertson, D. G. (2005). Array comparative genomic hybridization and its applications in cancer. Nat. Genet. 37 Suppl. S11-S17. [30] Quackenbush, J. (2002). Microarray data normalization and transformation. Nat. Genet. 32 Suppl. 496-501. [31] Robertson, T., Wright, F. T. and Dykstra, R. L. (1988). Order Restricted Statistical Inference . Wiley, Chichester. · Zbl 0645.62028 [32] Salari, K., Tibshirani, R. and Pollack, J. R. (2010). DR-Integrator: A new analytic tool for integrating DNA copy number and gene expression data. Bioinformatics 26 414-416. [33] Schäfer, M., Schwender, H., Merk, S., Haferlach, C., Ickstadt, K. and Dugas, M. (2009). Integrated analysis of copy number alterations and gene expression: A bivariate assessment of equally directed abnormalities. Bioinformatics 25 3228-3235. [34] Shapiro, A. (1988). Towards a unified theory of inequality constrained testing in multivariate analysis. Internat. Statist. Rev. 56 49-62. · Zbl 0661.62042 · doi:10.2307/1403361 [35] Silvapulle, M. J. and Sen, P. K. (2005). Constrained Statistical Inference : Inequality , Order , and Shape Restrictions . Wiley, Hoboken, NJ. · Zbl 1077.62019 · doi:10.1002/9781118165614 [36] Solvang, H. K., Lingjærde, O. C., Frigessi, A., Børresen-Dale, A.-L. and Kristensen, V. N. (2011). Linear and non-linear dependencies between copy number aberrations and mRNA expression reveal distinct molecular pathways in breast cancer. BMC Bioinformatics 12 197. [37] Soneson, C., Lilljebjörn, H., Fioretos, T. and Fontes, M. (2010). Integrative analysis of gene expression and copy number alterations using canonical correlation analysis. BMC Bioinformatics 11 191. [38] van de Wiel, M. A., Kim, K. I., Vosse, S. J., van Wieringen, W. N., Wilting, S. M. and Ylstra, B. (2007). CGHcall: Calling aberrations for array CGH tumor profiles. Bioinformatics 23 892-894. [39] van de Wiel, M. A., Picard, F., van Wieringen, W. N. and Ylstra, B. (2011). Preprocessing and downstream analysis of microarray DNA copy number profiles. Brief. Bioinformatics 12 10-21. [40] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3 . Cambridge Univ. Press, Cambridge. · Zbl 0910.62001 · doi:10.1017/CBO9780511802256 [41] van Wieringen, W. N., Berkhof, J. and van de Wiel, M. A. (2010). A random coefficients model for regional co-expression associated with DNA copy number. Stat. Appl. Genet. Mol. Biol. 9 30. · Zbl 1304.92064 [42] van Wieringen, W. N., van de Wiel, M. A. and Ylstra, B. (2007). Normalized, segmented or called aCGH data? Cancer Inform. 3 321-327. [43] van Wieringen, W. N. and van de Wiel, M. A. (2009). Nonparametric testing for DNA copy number induced differential mRNA gene expression. Biometrics 65 19-29. · Zbl 1159.62087 · doi:10.1111/j.1541-0420.2008.01052.x [44] VanAntwerp, J. (2000). A tutorial on linear and bilinear matrix inequalities. J. Process Contr. 10 363-385. [45] Vandenberghe, L. and Boyd, S. (1996). Semidefinite programming. SIAM Rev. 38 49-95. · Zbl 0845.65023 · doi:10.1137/1038003 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.