×

Attentional multilabel learning over graphs: a message passing approach. (English) Zbl 1493.68297

Summary: We address a largely open problem of multilabel classification over graphs. Unlike traditional vector input, a graph has rich variable-size substructures which are related to the labels in some ways. We believe that uncovering these relations might hold the key to classification performance and explainability. We introduce Graph Attention model for Multi-Label learning (GAML), a novel graph neural network that can handle this problem effectively. GAML regards labels as auxiliary nodes and models them in conjunction with the input graph. By applying the neural message passing algorithm and attention mechanism to both the label nodes and the input nodes iteratively, GAML can capture the relations between the labels and the input subgraphs at various resolution scales. Moreover, our model can take advantage of explicit label dependencies. It also scales linearly with the number of labels and graph size thanks to our proposed hierarchical attention. We evaluate GAML on an extensive set of experiments with both graph-structured inputs and classical unstructured inputs. The results show that GAML significantly outperforms other competing methods. Importantly, GAML enables intuitive visualizations for better understanding of the label-substructure relations and explanation of the model behaviors.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] Alanis-Lobato, G., Andrade-Navarro, M. A., & Schaefer, M. H. (2016). HIPPIE v2. 0: Enhancing meaningfulness and reliability of protein – protein interaction networks. Nucleic Acids Research, gkw985.
[2] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. ArXiv preprint arXiv:1409.0473.
[3] Bi, W., & Kwok, J. T. (2011). Multi-label classification on tree and dAG-structured hierarchies. In Proceedings of the international conference on machine learning, pp. 17-24.
[4] Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013). Spectral networks and locally connected networks on graphs. ArXiv preprint arXiv:1312.6203.
[5] Chen, M., Lin, Z., & Cho, K. (2018). Graph convolutional networks for classification with a structured label space. ArXiv preprint arXiv:1710.04908.
[6] Chen, S.-F., Chen, Y.-C., Yeh, C.-K., & Wang, Y.-C. F. (2017). Order-free RNN with visual attention for multi-label classification. ArXiv preprint arXiv:1707.05495.
[7] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., et al. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
[8] Defferrard, M. l., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of advances in neural information processing systems, pp. 3844-3852.
[9] Dembczynski, K., Cheng, W., & Hüllermeier, E. (2010). Bayes optimal multilabel classification via probabilistic classifier chains. In Proceedings of the international conference on machine learning, Vol. 10, pp. 279-286.
[10] Deng, J., Ding, N., Jia, Y., Frome, A., Murphy, K., Bengio, S., et al. (2014). Large-scale object classification using label relation graphs. In Proceedings of the European conference on computer vision (pp. 48-64). Springer.
[11] Fout, A., Byrd, J., Shariat, B., & Ben-Hur, A. (2017). Protein interface prediction using graph convolutional networks. In Proceedings of advances in neural information processing systems, pp. 6533-6542.
[12] Fürnkranz, J.; Hüllermeier, E.; Mencía, EL; Brinker, K., Multilabel classification via calibrated label ranking, Machine Learning, 73, 133-153, (2008)
[13] Ghamrawi, N., & McCallum, A. (2005). Collective multi-label classification. In Proceedings of the international conference on information and knowledge management (pp. 195-200). ACM.
[14] Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message passing for quantum chemistry. In Proceedings of the international conference on machine learning.
[15] Gong, Y., Jia, Y., Leung, T., Toshev, A., & Ioffe, S. (2013). Deep convolutional ranking for multilabel image annotation. ArXiv preprint arXiv:1312.4894.
[16] Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the international conference on knowledge discovery and data mining (pp. 855-864). ACM.
[17] Guo, Y., & Gu, S. (2011). Multi-label classification using conditional dependency networks. In Proceedings of the international joint conference on artificial intelligence, Vol. 22, p. 1300.
[18] Hamilton, W., Ying, Z., & Leskovec, J. (2017a). Inductive representation learning on large graphs. In Proceedings of advances in neural information processing systems, pp. 1025-1035.
[19] Hamilton, W. L., Ying, R., & Leskovec, J. (2017b). Representation learning on graphs: Methods and applications. ArXiv preprint arXiv:1709.05584.
[20] Jin, W., Barzilay, R., & Jaakkola, T. (2018). Junction tree variational autoencoder for molecular graph generation. ArXiv preprint arXiv:1802.04364.
[21] Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. ArXiv preprint arXiv:1412.6980.
[22] Kipf, T. N., & Welling, M. (2016a). Semi-supervised classification with graph convolutional networks. ArXiv preprint arXiv:1609.02907.
[23] Kipf, T. N., & Welling, M. (2016b). Variational graph auto-encoders. ArXiv preprint arXiv:1611.07308.
[24] Kong, X.; Philip, SY, gMLC: A multi-label feature selection framework for graph classification, Knowledge and Information Systems, 31, 281-305, (2012)
[25] Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the international conference on machine learning, pp. 282-289.
[26] Li, X., & Guo, Y. (2015). Multi-label classification with feature-aware non-linear label space transformation. In Proceedings of the international joint conference on artificial intelligence, Vol. 2015, pp. 3635-3642.
[27] Li, Y., Tarlow, D., Brockschmidt, M., & Zemel, R. (2016). Gated graph sequence neural networks. In Proceedings of the international conference on learning representations.
[28] Li, Y., Vinyals, O., Dyer, C., Pascanu, R., & Battaglia, P. (2018). Learning deep generative models of graphs. ArXiv preprint arXiv:1803.03324.
[29] Madjarov, G.; Kocev, D.; Gjorgjevikj, D.; Džeroski, S., An extensive experimental comparison of methods for multi-label learning, Pattern Recognition, 45, 3084-3104, (2012)
[30] Mullard, A., New drugs cost US [dollar] 2.6 billion to develop, Nature Reviews Drug Discovery, 13, 877-877, (2014)
[31] Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., & Jaiswal, S. (2016). graph2vec: Learning distributed representations of graphs. ArXiv preprint arXiv:1707.05005.
[32] Niepert, M., Ahmed, M., & Kutzkov, K. (2016). Learning convolutional neural networks for graphs. In Proceedings of the international conference on machine learning.
[33] Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the international conference on knowledge discovery and data mining (pp. 701-710). ACM.
[34] Pham, T., Tran, T., & Venkatesh, S. (2017). One size fits many: Column bundle for multi-x learning. ArXiv preprint arXiv:1702.07021.
[35] Pham, T., Tran, T., & Venkatesh, S. (2018). Graph memory networks for molecular activity prediction. ArXiv preprint arXiv:1801.02622.
[36] Pham, T., Tran, T., Phung, D., & Venkatesh, S. (2017). Column networks for collective classification. In Proceedings of AAAI conference on artificial intelligence.
[37] Read, J.; Reutemann, P.; Pfahringer, B.; Holmes, G., Meka: A multi-label/multi-target extension to Weka, Journal of Machine Learning Research, 17, 667-671, (2016) · Zbl 1360.68708
[38] Scarselli, F.; Gori, M.; Tsoi, AC; Hagenbuchner, M.; Monfardini, G., The graph neural network model, IEEE Transactions on Neural Networks, 20, 61-80, (2009)
[39] Schaefer, MH; Fontaine, J-F; Vinayagam, A.; Porras, P.; Wanker, EE; Andrade-Navarro, MA, HIPPIE: Integrating protein interaction networks with experiment based quality scores, PLoS ONE, 7, e31826, (2012)
[40] Schlichtkrull, M., Kipf, T. N., Bloem, P., van den Berg, R., Titov, I., & Welling, M. (2017). Modeling relational data with graph convolutional networks. ArXiv preprint arXiv:1703.06103.
[41] Segler, MHS; Kogej, T.; Tyrchan, C.; Waller, MP, Generating focused molecule libraries for drug discovery with recurrent neural networks, ACS Central Science, 4, 120-131, (2017)
[42] Shervashidze, N.; Schweitzer, P.; Leeuwen, EJ; Mehlhorn, K.; Borgwardt, KM, Weisfeiler-Lehman graph kernels, Journal of Machine Learning Research, 12, 2539-2561, (2011) · Zbl 1280.68194
[43] Simonovsky, M., & Komodakis, N. (2018). Graphvae: Towards generation of small graphs using variational autoencoders. ArXiv preprint arXiv:1802.03480.
[44] Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks. In Proceedings of advances in neural information processing systems, pp. 2377-2385.
[45] Sun, L.; Ji, S.; Ye, J., Canonical correlation analysis for multilabel classification: A least-squares formulation, extensions, and analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 194-200, (2011)
[46] Takigawa, I.; Mamitsuka, H., Generalized sparse learning of linear models over the complete subgraph feature set, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 617-624, (2017)
[47] Teney, D., Liu, L., & van den Hengel, A. (2017). Graph-structured representations for visual question answering. In Proceedings of IEEE conference on computer vision and pattern recognition.
[48] Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In Proceedings of the international conference on machine learning, pp. 823-830.
[49] Tsoumakas, G., & Vlahavas, I. (2007). Random k-labelsets: An ensemble method for multilabel classification. In Proceedings of the European conference on machine learning, pp. 406-417. Springer.
[50] Tsoumakas, G., Katakis, I., & Vlahavas, I. (2008). Effective and efficient multilabel classification in domains with large number of labels. In Proceedings of ECML/PKDD workshop on mining multidimensional data (Vol. 21, pp. 53-59). sn.
[51] Tsoumakas, G.; Katakis, I., Multi-label classification: An overview, International Journal of Data Warehousing and Mining, 3, 1-13, (2007)
[52] Tsoumakas, G.; Spyromitros-Xioufis, E.; Vilcek, J.; Vlahavas, I., Mulan: A java library for multi-label learning, Journal of Machine Learning Research, 12, 2411-2414, (2011) · Zbl 1280.68207
[53] Wang, H., Wang, J., Wang, J., Zhao, M., Zhang, W., Zhang, F., et al. (2017). Graphgan: Graph representation learning with generative adversarial nets. ArXiv preprint arXiv:1711.08267.
[54] Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., & Xu, W. (2016). CNN-RNN: A unified framework for multi-label image classification. In Proceedings of IEEE conference on computer vision and pattern recognition, pp. 2285-2294. IEEE.
[55] Wei, Y.; Xia, W.; Lin, M.; Huang, J.; Ni, B.; Dong, J.; Zhao, Y.; Yan, S., HCP: A flexible CNN framework for multi-label image classification, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 1901-1907, (2016)
[56] Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., et al. (2015). Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the international conference on machine learning, pp. 2048-2057.
[57] Yanardag, P., & Vishwanathan, S. V. N. (2015). Deep graph kernels. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1365-1374). ACM.
[58] Yeh, C.-K., Wu, W.-C., Ko, W.-J., & Wang, Y.-C. F. (2017). Learning deep latent space for multi-label classification. In Proceedings of AAAI conference on artificial intelligence, pp. 2838-2844.
[59] You, J., Ying, R., Ren, X., Hamilton, W. L., & Leskovec, J. (2018). Graphrnn: A deep generative model for graphs. ArXiv preprint arXiv:1802.08773.
[60] Zhang, D., Yin, J., Zhu, X., & Zhang, C. (2017). Network representation learning: A survey. ArXiv preprint arXiv:1801.05852.
[61] Zhang, M-L; Zhou, Z-H, ML-KNN: A lazy learning approach to multi-label learning, Pattern Recognition, 40, 2038-2048, (2007) · Zbl 1111.68629
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.