Mayrink, Vinicius Diniz; Lucas, Joseph Edward Sparse latent factor models with interactions: analysis of gene expression data. (English) Zbl 1288.62164 Ann. Appl. Stat. 7, No. 2, 799-822 (2013). Summary: Sparse latent multi-factor models have been used in many exploratory and predictive problems with high-dimensional multivariate observations. Because of concerns with identifiability, the latent factors are almost always assumed to be linearly related to the measured feature variables. We explore the analysis of multi-factor models with different structures of interactions between latent factors, including multiplicative effects as well as a more general framework for nonlinear interactions introduced via a Gaussian process. We utilize sparsity priors to test whether the factors and interaction terms have significant effects. The performance of the models is evaluated through simulated and real data applications in genomics. Variation in the number of copies of regions of the genome is a well-known and important feature of most cancers. We examine interactions between factors directly associated with different chromosomal regions detected with copy number alteration in breast cancer data. In this context, significant interaction effects for specific genes suggest synergies between duplications and deletions in different regions of the chromosome. Cited in 6 Documents MSC: 62P10 Applications of statistics to biology and medical sciences; meta analysis 92C40 Biochemistry, molecular biology 65C60 Computational problems in statistics (MSC2010) 92C50 Medical applications (general) Software:BioHMM × Cite Format Result Cite Review PDF Full Text: DOI arXiv Euclid References: [1] Abramowitz, M. and Stegun, I. A. (1965). Handbook of Mathematical Functions . Dover, New York. · Zbl 0171.38503 [2] Aldous, D. J. (1985). Exchangeability and related topics. In École D’été de Probabilités de Saint-Flour , XIII- 1983. Lecture Notes in Math. 1117 1-198. Springer, Berlin. · Zbl 0562.60042 [3] Arminger, G. and Muthen, B. O. (1998). A Bayesian approach to nonlinear latent variable models using the Gibbs Sampler and the Metropolis-Hastings algorithm. Psychometrika 63 271-300. · Zbl 1291.62191 [4] Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Pólya urn schemes. Ann. Statist. 1 353-355. · Zbl 0276.62010 · doi:10.1214/aos/1176342372 [5] Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q. and West, M. (2008). High-dimensional sparse factor modeling: Applications in gene expression genomics. J. Amer. Statist. Assoc. 103 1438-1456. · Zbl 1286.62091 · doi:10.1198/016214508000000869 [6] Chen, B., Chen, M., Paisley, J., Zaas, A., Woods, C., Ginsburg, G. S., Hero, A., Lucas, J., Dunson, D. and Carin, L. (2010). Bayesian inference of the number of factors in gene-expression analysis: Application to human virus challenge studies. BMC Bioinformatics 11 552. [7] Chin, K., DeVries, S., Fridlyand, J., Spellman, P. T., Roydasgupta, R., Kuo, W.-L., Lapuk, A., Neve, R. M., Qian, Z., Ryder, T., Chen, F., Feiler, H., Tokuyasu, T., Kingsley, C., Dairkee, S., Meng, Z., Chew, K., Pinkel, D., Jain, A., Ljung, B. M., Esserman, L., Albertson, D. G., Waldman, F. M. and Gray, J. W. (2006). Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell 10 529-541. [8] DeSantis, S. M., Houseman, E. A., Coull, B. A., Louis, D. N., Mohapatra, G. and Betensky, R. A. (2009). A latent class model with hidden Markov dependence for array CGH data. Biometrics 65 1296-1305. · Zbl 1180.62164 · doi:10.1111/j.1541-0420.2009.01226.x [9] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209-230. · Zbl 0255.62037 · doi:10.1214/aos/1176342360 [10] Ferguson, T. S. (1974). Prior distributions on spaces of probability measures. Ann. Statist. 2 615-629. · Zbl 0286.62008 · doi:10.1214/aos/1176342752 [11] Fridlyand, J., Snijders, A. M., Pinkel, D., Albertson, D. G. and Jain, A. N. (2004). Hidden Markov models approach to the analysis of array CGH data. J. Multivariate Anal. 90 132-153. · Zbl 1047.92026 · doi:10.1016/j.jmva.2004.02.008 [12] George, E. I. and McCulloch, E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881-889. [13] George, E. I. and McCulloch, E. (1997). Approaches for Bayesian variable selection. Statist. Sinica 7 339-373. · Zbl 0884.62031 [14] Geweke, J. (1996). Variable selection and model comparison in regression. In Bayesian Statistics , 5 ( Alicante , 1994) 609-620. Oxford Univ. Press, New York. [15] Henao, R. and Winther, O. (2010). Sparse linear identifiable multivariate modeling. Preprint, Cornell Univ, Ithaca, NY. Available at . · Zbl 1280.62032 [16] Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J. and Scholkopf, B. (2009). Nonlinear causal discovery with additive noise models. Adv. Neural Inf. Process. Syst. 21 689-696. [17] Lawrence, N. D. (2004). Gaussian process models for visualisation of high dimensional data. In Advances in Neural Information Processing Systems (S. Thrun, L. Saul and B. Scholkopf, eds.) 16 329-336. MIT Press, Cambridge, MA. · Zbl 1157.68431 [18] Lawrence, N. (2005). Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res. 6 1783-1816. · Zbl 1222.68247 [19] Lucas, J. E., Kung, H.-N. and Chi, J.-T. A. (2010). Latent factor analysis to discover pathway-associated putative segmental aneuploidies in human cancers. PLoS Comput. Biol. 6 e1000920. [20] Lucas, J. E., Carvalho, C., Wang, Q., Bild, A., Nevins, J. R. and West, M. (2006). Sparse statistical modelling in gene expression genomics. In Bayesian Inference for Gene Expression and Proteomics (P. Muller, K. Do and M. Vannucci, eds.) 155-176. Cambridge Univ. Press, Cambridge. [21] Marioni, J. C., Thorne, N. P., Tavare, S. and Radvanyi, F. (2006). BioHMM: A heterogeneous hidden Markov model for segmenting array CGH data. Bioinformatics 22 1144-1146. [22] Mayrink, V. D. and Lucas, J. E. (2013). Supplement to “Sparse latent factor models with interactions: Analysis of gene expression data.” . · Zbl 1288.62164 [23] Miller, L. D., Smeds, J., George, J., Vega, V. B., Vergara, L., Ploner, A., Pawitan, Y., Hall, P., Klaar, S., Liu, E. T. and Bergh, J. (2005). An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc. Natl. Acad. Sci. USA 102 13550-13555. [24] Pollack, J. R., Sorlie, T., Perou, C. M., Rees, C. A., Jeffrey, S. S., Lonning, P. E., Tibshirani, R., Botstein, D., Dale, A. L. B. and Brown, P. O. (2002). Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc. Natl. Acad. Sci. USA 99 12963-12968. [25] Przybytkowski, E., Ferrario, C. and Basik, M. (2011). The use of ultra-dense array CGH analysis for the discovery of micro-copy number alterations and gene fusions in the cancer genome. BMC Med. Genomics 4 16. [26] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning . MIT Press, Cambridge, MA. · Zbl 1177.68165 [27] Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Kains, B. H., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Vijver, M. J. V. D., Bergh, J., Piccart, M. and Delorenzi, M. (2006). Gene expression profiling in breast cancer: Understanding the molecular basis of histologic grade to improve prognosis. Journal of the National Cancer Institute 98 262-272. [28] Teh, Y. W., Seeger, M. and Jordan, M. I. (2005). Semiparametric latent factor models. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (Z. Ghahramani and R. Cowell, eds.) 333-340. The Society for Artificial Intelligence and Statistics. [29] Titsias, M., Lawrence, N. D. and Rattray, M. (2009). Efficient sampling for Gaussian process inference using control variables. In Advances in Neural Information Processing Systems 21 (D. Koller, Y. Bengio, D. Schuurmans and L. Bottou, eds.) 689-696. MIT Press, Cambridge, MA. [30] Wang, Y., Klijn, J. G. M., Zhang, Y., Sieuwerts, A. M., Look, M. P., Yang, F., Talantov, D., Timmermans, M., Gelder, M. E. M. V., Yu, J., Jatkoe, T., Berns, E. M. J. J., Atkins, D. and Foekens, J. A. (2005). Gene expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365 671-679. [31] West, M. (2003). Bayesian factor regression models in the large \(p\), small \(n\) paradigm. In Bayesian Statistics 7 (J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith and M. West, eds.) 723-732. Oxford Univ. Press, Oxford. This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.