A constrained matrix-variate Gaussian process for transposable data. (English) Zbl 1320.62052

Summary: Transposable data represents interactions among two sets of entities, and are typically represented as a matrix containing the known interaction values. Additional side information may consist of feature vectors specific to entities corresponding to the rows and/or columns of such a matrix. Further information may also be available in the form of interactions or hierarchies among entities along the same mode (axis). We propose a novel approach for modeling transposable data with missing interactions given additional side information. The interactions are modeled as noisy observations from a latent noise free matrix generated from a matrix-variate Gaussian process. The construction of row and column covariances using side information provides a flexible mechanism for specifying a-priori knowledge of the row and column correlations in the data. Further, the use of such a prior combined with the side information enables predictions for new rows and columns not observed in the training data. In this work, we combine the matrix-variate Gaussian process model with low rank constraints. The constrained Gaussian process approach is applied to the prediction of hidden associations between genes and diseases using a small set of observed associations as well as prior covariances induced by gene-gene interaction networks and disease ontologies. The proposed approach is also applied to recommender systems data which involves predicting the item ratings of users using known associations as well as prior covariances induced by social networks. We present experimental results that highlight the performance of constrained matrix-variate Gaussian process as compared to state of the art approaches in each domain.


62F15 Bayesian inference
60G15 Gaussian processes
62F30 Parametric inference under constraints
62J02 General nonlinear regression


Medlda; ProDiGe; PRMLT
Full Text: DOI arXiv


[1] Abernethy, J; Bach, F; Evgeniou, T; Vert, JP, A new approach to collaborative filtering: operator estimation with spectral regularization, JMLR: The Journal of Machine Learning Research, 10, 803-826, (2009) · Zbl 1235.68122
[2] Aerts, S; Lambrechts, D; Maity, S; Loo, P; Coessens, B; Smet, F; Tranchevent, LC; Moor, B; Marynen, P; Hassan, B; etal., Gene prioritization through genomic data fusion, Nature Biotechnology, 24, 537-544, (2006)
[3] Allen, GI; Tibshirani, R, Transposable regularized covariance models with an application to missing data imputation, The Annals of Applied Statistics, 4, 764-790, (2010) · Zbl 1194.62079
[4] Allen, G. I., & Tibshirani, R. (2012). Inference with transposable data: Modelling the effects of row and column correlations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(4), 721-743.
[5] Altun, Y., & Smola, A. J. (2006). Unifying divergence minimization and statistical inference via convex duality. In: COLT. · Zbl 1143.68513
[6] Álvarez, MA; Rosasco, L; Lawrence, ND, Kernels for vector-valued functions: A review, Foundations and Trends in Machine Learning, 4, 195-266, (2012) · Zbl 1301.68212
[7] Bauer, H. (1996). Probability Theory. De Gruyter Studies in Mathematics Series: De Gruyter. · Zbl 0868.60001
[8] Berger, AL; Pietra, VJD; Pietra, SAD, A maximum entropy approach to natural language processing, Comput Linguist, 22, 39-71, (1996)
[9] Berlinet, A., & Thomas-Agnan, C. (2004). Reproducing kernel Hilbert spaces in probability and statistics. Boston, Dordrecht, London: Kluwer Academic Publishers. · Zbl 1145.62002
[10] Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). Secaucus, NJ, USA: Springer. · Zbl 1107.68072
[11] Bonilla, E., Chai, K. M., & Williams, C. (2008). Multi-task gaussian process prediction. In: NIPS ,20, 153-160.
[12] Borwein, J., & Zhu, Q. (2005). Techniques of variational analysis, CMS books in mathematics. Berlin: Springer.
[13] Candès, EJ; Recht, B, Exact matrix completion via convex optimization, Foundations of Computational Mathematics, 9, 717-772, (2009) · Zbl 1219.90124
[14] Csató, L. (2002). Gaussian processes: Iterative sparse approximations. PhD thesis, Aston University.
[15] Dudík, M; Phillips, SJ; Schapire, RE, Maximum entropy density estimation with generalized regularization and an application to species distribution modeling, Journal of Machine Learning Research, 8, 1217-1260, (2007) · Zbl 1222.62010
[16] Dudik, M., Harchaoui, Z., Malick, J., et al. (2012). Lifted coordinate descent for learning with trace-norm regularization. In: AISTATS-proceedings of the fifteenth international conference on artificial intelligence and statistics-2012, Vol. 22.
[17] Ganchev, K; Ja, Graça, Posterior regularization for structured latent variable models, Journal of Machine Learning Research, 11, 2001-2049, (2010) · Zbl 1242.68223
[18] Gelfand, AE; Smith, AFM; Lee, TM, Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling, Journal of the American Statistical Association, 87, 523-532, (1992)
[19] Hu, Y., Koren, Y., Volinsky, C. (2008). Collaborative filtering for implicit feedback datasets. In: Data Mining, 2008. ICDM’08. Eighth IEEE international conference on, IEEE, pp. 263-272.
[20] Jaakkola, T., Meila, M., Jebara, T. (1999). Maximum entropy discrimination. In: NIPS, MIT Press.
[21] Jamali, M., & Ester, M. (2010). A matrix factorization technique with trust propagation for recommendation in social networks. In: Proceedings of the fourth ACM conference on recommender systems, ACM, pp. 135-142.
[22] Koren, Y; Bell, R; Volinsky, C, Matrix factorization techniques for recommender systems, Computer, 42, 30-37, (2009)
[23] Koyejo, O. (2013). Constrained relative entropy minimization with applications to multitask learning. PhD thesis, The University of Texas at Austin.
[24] Koyejo, O., & Ghosh, J. (2011). A kernel-based approach to exploiting interaction-networks in heterogeneous information sources for improved recommender systems. In: Proceedings of the 2nd international workshop on information heterogeneity and fusion in recommender systems, ACM, pp. 9-16.
[25] Koyejo, O., & Ghosh, J. (2013). Constrained Bayesian inference for low rank multitask learning. In: Proceedings of the 29th conference on Uncertainty in artificial intelligence (UAI).
[26] Koyejo, O., & Ghosh, J. (2013). A representation approach for relative entropy minimization with expectation constraints. In: ICML workshop on divergences and divergence learning (WDDL).
[27] Laue, S. (2012). A hybrid algorithm for convex semidefinite optimization. In: Proceedings of the 29th international conference on machine learning (ICML-12), pp. 177-184.
[28] Lawrence, N; Hyvärinen, A, Probabilistic non-linear principal component analysis with Gaussian process latent variable models, Journal of Machine Learning Research, 6, 1783-1816, (2005) · Zbl 1222.68247
[29] Lawrence, N. D., & Urtasun, R. (2009). Non-linear matrix factorization with gaussian processes. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp. 601-608.
[30] Lee, I; Blom, UM; Wang, PI; Shim, JE; Marcotte, EM, Prioritizing candidate disease genes by network-based boosting of genome-wide association data, Genome Research, 21, 1109-1121, (2011)
[31] Li, L; Toh, KC, An inexact interior point method for l 1-regularized sparse covariance selection, Mathematical Programming Computation, 2, 291-315, (2010) · Zbl 1208.90131
[32] Li, W. J., & Yeung, D. Y. (2009). Relation regularized matrix factorization. In: Proceedings of the 21st international joint conference on artificial intelligence, IJCAI’09, pp. 1126-1131.
[33] Li, W. J., Yeung, D. Y., & Zhang, Z. (2009). Probabilistic relational PCA. In: Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 1123-1131).
[34] Li, W. J., Zhang, Z., Yeung D. Y. (2009). Latent Wishart processes for relational kernel learning. In: D. A. V. Dyk & M. Welling (Eds.), AISTATS, pp. 336-343.
[35] Ma, H., Yang, H., Lyu, M. R., King, I. (2008). Sorec: Social recommendation using probabilistic matrix factorization. In: Proceeding of the 17th ACM conference on Information and knowledge management, ACM, New York, NY, USA, CIKM ’08, pp. 931-940.
[36] Maglott, DR; Ostell, J; Pruitt, KD; Tatusova, TA, Entrez gene: gene-centered information at NCBI, Nucleic Acids Research, 39, 52-57, (2011)
[37] Massa, P., & Avesani, P. (2006). Trust-aware bootstrapping of recommender systems. In: ECAI 2006 workshop on recommender systems, pp. 29-33.
[38] McCarthy, MI; Abecasis, GR; Cardon, LR; Goldstein, DB; Little, J; Ioannidis, JP; Hirschhorn, JN, Genome-wide association studies for complex traits: consensus, uncertainty and challenges, Nature Reviews Genetics, 9, 356-369, (2008)
[39] Mnih, A., & Salakhutdinov, R. (2007). Probabilistic matrix factorization. In: J. C. Platt, D. Koller, Y. Singer & S. T. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 1257-1264).
[40] Mordelet, F; Vert, JP, Prodige: prioritization of disease genes with multitask machine learning from positive and unlabeled examples, BMC Bioinformatics, 12, 389, (2011)
[41] National Library of Medicine. (2012) Medical subject headings. http://www.nlm.nih.gov/mesh/. Retrieved from March 2012.
[42] National Library of Medicine. (2012). PubMed. http://www.ncbi.nlm.nih.gov/pubmed/. Retrieved from March 2012.
[43] NCBI. (1998). Genes and disease. Online, URL http://www.ncbi.nlm.nih.gov/books/NBK22183/. Retrieved from January 10, 2011.
[44] Orbanz, P., & Teh, Y. W. (2010). Bayesian nonparametric models. In: C. Sammut & G. I. Webb (Eds.),Encyclopedia of machine learning. Berlin: Springer.
[45] Pan, R., Zhou, Y., Cao, B., Liu, N. N., Lukose, R., Scholz, M., Yang, Q. (2008). One-class collaborative filtering. In: Data mining, 2008. ICDM’08. eighth IEEE international conference on, IEEE, pp. 502-511.
[46] Pong, TK; Tseng, P; Ji, S; Ye, J, Trace norm regularization: reformulations, algorithms, and multi-task learning, SIAM Journal on Optimization, 20, 3465-3489, (2010) · Zbl 1211.90129
[47] Rasmussen, C. E., & Williams, C. K. I. (2005). Gaussian processes for machine learning (adaptive computation and machine learning series). Cambridge, MA: The MIT Press.
[48] Singh-Blom, UM; Natarajan, N; Tewari, A; Woods, JO; Dhillon, IS; Marcotte, EM, Prediction and validation of gene-disease associations using methods inspired by social network analyses, PloS One, 8, e58,977, (2013)
[49] Smola, A. J., & Kondor, R. (2003). Kernels and regularization on graphs. In: B. Schölkopf & M. K. Warmuth (Eds.), Learning theory and kernel machines (pp. 144-158). Berlin: Springer. · Zbl 1274.68351
[50] Steck, H. (2010). Training and testing of recommender systems on data missing not at random. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 713-722.
[51] Steck, H., & Zemel, R. S. (2010). A generalized probabilistic framework and its variants for training top-k recommender systems. In: PRSAT.
[52] Stegle, O., Lippert, C., Mooij, J. M., Lawrence, N. D., Borgwardt, K. M. (2011). Efficient inference in matrix-variate gaussian models with iid observation noise. In: Advances in neural information processing systems (pp 630-638).
[53] Sutskever, I., Tenenbaum, J. B., Salakhutdinov, R. (2009). Modelling relational data using bayesian clustered tensor factorization. In: Advances in neural information processing systems (pp 1821-1828).
[54] Vanunu, O; Magger, O; Ruppin, E; Shlomi, T; Sharan, R, Associating genes and protein complexes with disease via network propagation, PLoS Computational Biology, 6, e1000641, (2010)
[55] Xu, M; Zhu, J; Zhang, B, Nonparametric MAX-margin matrix factorization for collaborative prediction, Advances in Neural Information Processing Systems, 25, 64-72, (2012)
[56] Xu, Z., Tresp, V., Yu, K., Kriegel, H. P. (2006). Learning infinite hidden relational models. Uncertainity in, Artificial Intelligence (UAI2006).
[57] Xu, Z., Kersting, K., & Tresp, V. (2009). Multi-relational learning with gaussian processes. In: Proceedings of the 21st international joint conference on artificial intelligence, IJCAI’09, pp. 1309-1314.
[58] Yan, F., Xu, Z., Qi, Y. A. (2011). Sparse matrix-variate gaussian process blockmodels for network modeling. In: UAI.
[59] Yu, K., & Chu, W. (2008). Gaussian process models for link analysis and transfer learning. In: NIPS, pp 1657-1664.
[60] Yu, K; Chu, W; Yu, S; Tresp, V; Xu, Z, Stochastic relational models for discriminative link prediction, 1553-1560, (2007), Cambridge, MA
[61] Yu, Y., Cheng, H., Schuurmans, D., Szepesvri, C. (2013). Characterizing the representer theorem. In: ICML.
[62] Zellner, A, Optimal information processing and bayes’s theorem, The American Statistician, 42, 278-280, (1988)
[63] Zhang, X; Carin, L, Joint modeling of a matrix with associated text via latent binary features, Advances in Neural Information Processing Systems, 25, 1565-1573, (2012)
[64] Zhou, T., Shan, H., Banerjee, A., Sapiro, G. (2012). Kernelized probabilistic matrix factorization: Exploiting graphs and side information. In: SDM, pp 403-414.
[65] Zhu, J. (2012). Max-margin nonparametric latent feature models for link prediction. In: Proceedings of the 29th international conference on machine learning (ICML-12), pp 719-726.
[66] Zhu, J., Ahmed, A., Xing, E. P. (2009). Medlda: Maximum margin supervised topic models for regression and classification. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 1257-1264.
[67] Zhu, J., Chen, N., Xing, E. P. (2011). Infinite latent SVM for classification and multi-task learning. In: J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 24, pp 1620-1628).
[68] Zhu, J., Chen, N., Xing, E. P. (2012). Bayesian inference with posterior regularization and infinite latent support vector machines. CoRR abs/1210.1766.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.