×

zbMATH — the first resource for mathematics

Probabilistic learning on graphs via contextual architectures. (English) Zbl 07255165
Summary: We propose a novel methodology for representation learning on graph-structured data, in which a stack of Bayesian Networks learns different distributions of a vertex’s neighbourhood. Through an incremental construction policy and layer-wise training, we can build deeper architectures with respect to typical graph convolutional neural networks, with benefits in terms of context spreading between vertices. First, the model learns from graphs via maximum likelihood estimation without using target labels. Then, a supervised readout is applied to the learned graph embeddings to deal with graph classification and vertex classification tasks, showing competitive results against neural models for graphs. The computational complexity is linear in the number of edges, facilitating learning on large scale data sets. By studying how depth affects the performances of our model, we discover that a broader context generally improves performances. In turn, this leads to a critical analysis of some benchmarks used in literature.
MSC:
68T05 Learning and adaptive systems in artificial intelligence
Software:
Adam; NetKit; PyTorch
PDF BibTeX XML Cite
Full Text: Link
References:
[1] James Atwood and Don Towsley. Diffusion-convolutional neural networks. InProceedings of the 30th Conference on Neural Information Processing Systems (NIPS), pages 1993-2001, 2016.
[2] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. InNeural Information Processing Systems (NIPS) Deep Learning Symposium, 2016.
[3] Davide Bacciu, Alessio Micheli, and Alessandro Sperduti. Compositional generative mapping for tree-structured data - part I: Bottom-up probabilistic modeling of trees.IEEE Transactions on Neural Networks and Learning Systems, 23(12):1987-2002, 2012. Publisher: IEEE.
[4] Davide Bacciu, Alessio Micheli, and Alessandro Sperduti. An inputoutput hidden Markov model for tree transductions.Neurocomputing, 112:34-46, 2013. Publisher: Elsevier.
[5] Davide Bacciu, Federico Errica, and Alessio Micheli. Contextual Graph Markov Model: A deep and generative approach to graph processing. InProceedings of the 35th International Conference on Machine Learning (ICML), volume 80, pages 294-303. PMLR, 2018a.
[6] Davide Bacciu, Alessio Micheli, and Alessandro Sperduti. Generative kernels for treestructured data.IEEE Transactions on Neural Networks and Learning Systems, 29(10): 4932-4946, 2018b. Publisher: IEEE.
[7] Davide Bacciu, Federico Errica, Alessio Micheli, and Marco Podda. A gentle introduction to deep learning for graphs.Neural Networks, 129:203-221, 2020.
[8] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. InProceedings of the 3rd International Conference on Learning Representations (ICLR), 2015.
[9] Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, and others. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
[10] Filippo Maria Bianchi, Daniele Grattarola, Lorenzo Livi, and Cesare Alippi. Graph neural networks with convolutional arma filters.arXiv preprint arXiv:1901.01343, 2019.
[11] Christopher M Bishop.Pattern Recognition and Machine Learning. Springer, 2006.
[12] Aleksandar Bojchevski and Stephan Gnnemann. Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking.Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018.
[13] Karsten M Borgwardt, Cheng Soon Ong, Stefan Schnauer, SVN Vishwanathan, Alex J Smola, and Hans-Peter Kriegel. Protein function prediction via graph kernels.Bioinformatics, 21(suppl 1):i47-i56, 2005. Publisher: Oxford University Press.
[14] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs.Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2014.
[15] Daniele Castellana and Davide Bacciu. Bayesian tensor factorisation for bottom-up hidden tree Markov models. InProceedings of the International Joint Conference on Neural Networks (IJCNN), 2019.
[16] Giovanni Da San Martino, Nicolo Navarin, and Alessandro Sperduti. A tree-based kernel for graphs. InProceedings of the 12th International Conference on Data Mining (ICDM), pages 975-986. SIAM, 2012.
[17] Michelangelo Diligenti, Paolo Frasconi, and Marco Gori. Hidden tree Markov models for document image classification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(4):519-523, 2003. Publisher: IEEE.
[18] Paul D Dobson and Andrew J Doig. Distinguishing enzyme structures from non-enzymes without alignments.Journal of Molecular Biology, 330(4):771-783, 2003. Publisher: Elsevier.
[19] Brendan L Douglas. The Weisfeiler-Lehman method and graph isomorphism testing.arXiv preprint arXiv:1101.5211, 2011.
[20] Jeffrey L Elman. Finding structure in time.Cognitive Science, 14(2):179-211, 1990. Publisher: Wiley Online Library.
[21] Scott E. Fahlman and Christian Lebiere. The Cascade-Correlation learning architecture. InProceedings of the 3rd Conference on Neural Information Processing Systems (NIPS), pages 524-532, 1990.
[22] Matthias Fey and Jan Eric Lenssen. Fast graph representation learning with PyTorch Geometric.Workshop on Representation Learning on Graphs and Manifolds, International Conference on Learning Representations (ICLR), 2019.
[23] Paolo Frasconi, Marco Gori, and Alessandro Sperduti. A general framework for adaptive processing of data structures.IEEE Transactions on Neural Networks, 9(5):768-786, 1998. Publisher: IEEE.
[24] Paolo Frasconi, Fabrizio Costa, Luc De Raedt, and Kurt De Grave. klog: A language for logical and relational learning with kernels.Artificial Intelligence, 217:117-143, 2014. Publisher: Elsevier.
[25] Claudio Gallicchio and Alessio Micheli. Graph echo state networks. InProceedings of the International Joint Conference on Neural Networks (IJCNN), pages 1-8. IEEE, 2010.
[26] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. InProceedings of the 34th International Conference on Machine Learning (ICML), pages 1263-1272, 2017.
[27] Christoph Goller and Andreas Kuchler. Learning task-dependent distributed representations by backpropagation through structure. InProceedings of the International Conference on Neural Networks (ICNN), volume 1, pages 347-352. IEEE, 1996.
[28] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. InProceedings of the 31st Conference on Neural Information Processing Systems (NIPS), pages 1024-1034, 2017.
[29] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. InProceedings of the 32nd International Conference on Machine Learning (ICML), pages 448-456, 2015.
[30] Eugene M Izhikevich. Simple model of spiking neurons.IEEE Transactions on Neural Networks, 14(6):1569-1572, 2003.
[31] Kristian Kersting, Nils M. Kriege, Christopher Morris, Petra Mutzel, and Marion Neumann. Benchmark data sets for graph kernels. 2020. URLhttp://www.graphlearning.io/.
[32] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2015.
[33] Diederik P Kingma and Max Welling. Auto-encoding variational Bayes.Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2014.
[34] Thomas N Kipf and Max Welling. Variational graph auto-encoders. InWorkshop on Bayesian Deep Learning, Neural Information Processing System (NIPS), 2016.
[35] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. InProceedings of the 5th International Conference on Learning Representations (ICLR), 2017.
[36] Johannes Klicpera, Aleksandar Bojchevski, and Stephan Gnnemann. Predict then propagate: graph neural networks meet personalized PageRank. InProceedings of the 7th International Conference on Learning Representations (ICLR), 2019.
[37] Ron Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), volume 14, pages 1137-1145, 1995.
[38] Yann LeCun, Yoshua Bengio, and others. Convolutional networks for images, speech, and time series.The Handbook of Brain Theory and Neural Networks, 3361(10):1995, 1995.
[39] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. InProceedings of the 11th International Conference on Knowledge Discovery in Data Mining (SIGKDD), pages 177-187. ACM, 2005.
[40] Qimai Li, Zhichao Han, and Xiao-Ming Wu. Deeper insights into graph convolutional networks for semi-supervised learning. InProceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), 2018.
[41] Sofus A Macskassy and Foster Provost. Classification in networked data: A toolkit and a univariate case study.Journal of Machine Learning Research, 8(May):935-983, 2007.
[42] Enrique S Marquez, Jonathon S Hare, and Mahesan Niranjan. Deep cascade learning. IEEE Transactions on Neural Networks and Learning Systems, 29(11):5475-5485, 2018. Publisher: IEEE.
[43] Alessio Micheli. Neural network for graphs: A contextual constructive approach.IEEE Transactions on Neural Networks, 20(3):498-511, 2009. Publisher: IEEE.
[44] Alessio Micheli, Diego Sona, and Alessandro Sperduti. Contextual processing of structured data by recursive cascade correlation.IEEE Transactions on Neural Networks, 15(6): 1396-1410, 2004. Publisher: IEEE.
[45] Todd K Moon. The expectation-maximization algorithm.IEEE Signal Processing Magazine, 13(6):47-60, 1996. Publisher: IEEE.
[46] Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. InProceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI), volume 33, pages 4602-4609, 2019.
[47] Radford M Neal and Geoffrey E Hinton. A view of the em algorithm that justifies incremental, sparse, and other variants. InLearning in graphical models, pages 355-368. Springer, 1998.
[48] Marion Neumann, Novi Patricia, Roman Garnett, and Kristian Kersting. Efficient graph kernels by randomization. InProceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pages 378-393. Springer, 2012.
[49] Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural networks for graphs. InProceedings of the 33rd International Conference on Machine Learning (ICML), pages 2014-2023, 2016.
[50] Lutz Prechelt. Early stopping-but when? InNeural Networks: Tricks of the trade, pages 55-69. Springer, 1998.
[51] Meng Qu, Yoshua Bengio, and Jian Tang. GMNN: Graph Markov Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 5241-5250, 2019.
[52] Lawrence R Rabiner and Biing-Hwang Juang. An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1):4-16, 1986. Publisher: IEEE.
[53] Liva Ralaivola, Sanjay J Swamidass, Hiroto Saigo, and Pierre Baldi. Graph kernels for chemical informatics.Neural Networks, 18(8):1093-1110, 2005. Publisher: Elsevier.
[54] Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. struc2vec: Learning node representations from structural identity. InProceedings of the 23rd International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 385-394. ACM, 2017.
[55] Frank Rosenblatt.The Perceptron, a Perceiving and Recognizing Automaton Project Para. Cornell Aeronautical Laboratory, 1957.
[56] Lawrence K Saul and Michael I Jordan. Mixed memory Markov models: Decomposing complex stochastic processes as mixtures of simpler ones.Machine Learning, 37(1):75- 87, 1999. Publisher: Springer.
[57] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model.IEEE Transactions on Neural Networks, 20 (1):61-80, 2009. Publisher: IEEE.
[58] Roded Sharan and Trey Ideker. Modeling cellular machinery through biological network comparison.Nature Biotechnology, 24(4):427, 2006. Publisher: Nature Publishing Group.
[59] Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Gnnemann. Pitfalls of graph neural network evaluation.Workshop on Relational Representation Learning, Neural Information Processing Systems (NeurIPS), 2018.
[60] Nino Shervashidze, SVN Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten Borgwardt. Efficient graphlet kernels for large graph comparison. InProceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 488- 495, 2009.
[61] Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M Borgwardt. Weisfeiler-lehman graph kernels.Journal of Machine Learning Research, 12(Sep):2539-2561, 2011.
[62] Martin Simonovsky and Nikos Komodakis. Dynamic edge-conditioned filters in convolutional neural networks on graphs. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3693-3702, 2017.
[63] Alessandro Sperduti and Antonina Starita. Supervised neural networks for the classification of structures.IEEE Transactions on Neural Networks, 8(3):714-735, 1997. Publisher: IEEE.
[64] S Joshua Swamidass, Jonathan Chen, Jocelyne Bruand, Peter Phung, Liva Ralaivola, and Pierre Baldi. Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity.Bioinformatics, 21(suppl 1):i359-i368, 2005. Publisher: Oxford University Press.
[65] Dinh V Tran, Nicol Navarin, and Alessandro Sperduti. On filter size in graph convolutional networks. InIEEE Symposium Series on Computational Intelligence (SSCI), pages 1534- 1541. IEEE, 2018.
[66] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. InProceedings of the 6th International Conference on Learning Representations (ICLR), 2018.
[67] Petar Velickovic, William Fedus, William L. Hamilton, Pietro Li, Yoshua Bengio, and R. Devon Hjelm. Deep Graph Infomax. InProceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA, May 6-9, 2019, 2019.
[68] S Vichy N Vishwanathan, Nicol N Schraudolph, Risi Kondor, and Karsten M Borgwardt. Graph kernels.Journal of Machine Learning Research, 11(Apr):1201-1242, 2010.
[69] Edward Wagstaff, Fabian B Fuchs, Martin Engelcke, Ingmar Posner, and Michael Osborne. On the limitations of representing functions on sets. InProceedings of the 36th International Conference on Machine Learning (ICML), pages 6487-6494, 2019.
[70] Nikil Wale, Ian A Watson, and George Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification.Knowledge and Information Systems, 14(3): 347-375, 2008. Publisher: Springer.
[71] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? InProceedings of the 7th International Conference on Learning Representations (ICLR), 2019.
[72] Pinar Yanardag and SVN Vishwanathan. Deep graph kernels. InProceedings of the 21th International Conference on Knowledge Discovery and Data Mining (SIGKDD, pages 1365-1374. ACM, 2015.
[73] Zhitao Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure Leskovec. Hierarchical graph representation learning with differentiable pooling. InProceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), 2018.
[74] Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan R Salakhutdinov, and Alexander J Smola. Deep sets. InProceedings of the 31st Conference on Neural Information Processing Systems (NIPS), pages 3391-3401, 2017.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.