Effective graph classification based on topological and label attributes. (English) Zbl 07260330

Summary: Graph classification is an important data mining task, and various graph kernel methods have been proposed recently for this task. These methods have proven to be effective, but they tend to have high computational overhead. In this paper, we propose an alternative approach to graph classification that is based on feature vectors constructed from different global topological attributes, as well as global label features. The main idea is that the graphs from the same class should have similar topological and label attributes. Our method is simple and easy to implement, and via a detailed comparison on real benchmark datasets, we show that our topological and label feature-based approach delivers competitive classification accuracy, with significantly better results on those datasets that have large unlabeled graph instances. Our method is also substantially faster than most other graph kernels.


62-XX Statistics
68-XX Computer science
Full Text: DOI


[1] T. Joachims, T. Hofmann, Y. Yue, and C.-N. Yu, Predicting structured objects with support vector machines, Commun ACM 52(11) (2009), 97-104.
[2] L. Ralaivola, S. J. Swamidass, H. Saigo, and P. Baldi, Graph kernels for chemical informatics, Neural Netw 18 (2005), 1093-1110.
[3] P. Mah‘e and J.-P. Vert, Graph kernels based on tree patterns for molecules, Mach Learn 75(1) (2009), 3-35.
[4] K. M. Borgwardt, C. S. Ong, S. Sch¨onauer, S. V. N. Vishwanathan, A. J. Smola, and H.-P. Kriegel, Protein function prediction via graph kernels, In ISMB (Supplement of Bioinformatics), 2005, 47-56.
[5] C. Bilgin, C. Demir, C. Nagi, and B. Yener, Cell-graph mining for breast tissue modeling and classification, In 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2007, 5311-5314.
[6] B. Scholkopf and A. J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, Cambridge, MA, MIT Press, 2001.
[7] V. N. Vapnik, The Nature of Statistical Learning Theory, New York, Springer-Verlag, 1995. · Zbl 0833.62008
[8] T. G¨artner, P. Flach, and S. Wrobel, On graph kernels: hardness results and efficient alternatives, In 16th Annual Conference on Computational Learning Theory, 2003, 129-143. · Zbl 1274.68312
[9] H. Kashima, K. Tsuda, and A. Inokuchi, Marginalized kernels between labeled graphs, In International Conference on Machine Learning, 2003, 321-328.
[10] K. M. Borgwardt and H.-P. Kriegel, Shortest-path kernels on graphs, In 5th IEEE International Conference on Data Mining, Washington, DC, 2005, 74-81.
[11] T. Horv‘ath, T. G¨artner, and S. Wrobel, Cyclic pattern kernels for predictive graph mining, In 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, 158-167.
[12] J. Ramon and T. G¨artner, Expressivity versus efficiency of graphs kernels, In 1st International Workshop on Mining Graphs, Trees and Sequences, 2003.
[13] N. Shervashidze and K. M. Borgwardt, Fast subtree kernels on graphs, In Advances in Neural Information Processing Systems, 2009, 1660-1668.
[14] N. Shervashidze, S. V. Vishwanathan, T. H. Petri, K. Mehlhorn, and K. M. Borgwardt, Efficient graphlet kernels for large graph comparison, In 12th International
[15] I. R. Kondor, N. Shervashidze, and K. M. Borgwardt, The graphlet spectrum, In International Conference on Machine Learning, Vol. 382, A. P. Danyluk, L. Bottou, and M. L. Littman, eds., 2009, 67.
[16] M. Thoma, H. Cheng, A. Gretton, J. Han, H.-P. Kriegel, A. Smola, L. Song, P. S. Yu, X. Yan, and K. M. Borgwardt, Discriminative frequent subgraph mining with optimality guarantees, Stat Anal Data Mining 3(5) (2010), 302-318.
[17] S. V. N. Vishwanathan, K. M. Borgwardt, and N. N. Schraudolph, Fast computation of graph kernels, In Neural Information Processing Systems, B. Sch¨olkopf, J. Platt, and T. Hoffman, eds., Cambridge, MA, MIT Press, 2006, 1449-1456.
[18] I. R. Kondor and J. Lafferty, Diffusion kernels on graphs and other discrete structures, In International Conference on Machine Learning, 2002, 315-322.
[19] G. Li, M. Semerci, B. Yener, and M. J. Zaki, Graph classification via topological and label attributes, In 9th Workshop on Mining and Learning with Graphs (with SIGKDD), 2011.
[20] P. Mah‘e, N. Ueda, T. Akutsu, J.-L. Perret, and J.-P. Vert, Extensions of marginalized graph kernels, In International Conference on Machine Learning, 2004.
[21] S. V. N. Vishwanathan, N. N. Schraudolph, I. R. Kondor, and K. M. Borgwardt, Graph kernels, J Mach Learn Res 11 (2010), 1201-1242. · Zbl 1242.05112
[22] X. Yan and J. Han, gspan: Graph-based substructure pattern mining, In IEEE International Conference on Data Mining, 2002.
[23] C. Meyer, Matrix Analysis and Applied Linear Algebra, Philadelphia, PA, SIAM, 2000.
[24] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. Vorst, Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, Philadelphia, SIAM, 2000. · Zbl 0965.65058
[25] S. Nijssen and J. Kok, A quickstart in frequent structure mining can make a difference, Proceedings of the 2004 ACM SIGKDD international conference o n Knowledge discovery and data mining, 2004, 647-652.
[26] E. Jones, T. Oliphant, P. Peterson, et al. SciPy: Open source scientific tools for Python, 2001. Available at http://www. scipy.org.
[27] A. A. Hagberg, D. A. Schult, and P. J. Swart, Exploring network structure, dynamics, and function using NetworkX, In Proceedings of the 7th Python in Science Conference (SciPy2008), Pasadena, CA,2008, 11-15.
[28] C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector machines, ACM Trans Intell Syst Technol 2 (2011), 27:1-27:27. Software available at http://www.csie.ntu.edu. tw/∼cjlin/libsvm.
[29] A. Debnath, R. L. de Compadre, G. Debnath, A. Shusterman, and C. Hansch, The structure-activity relationship of mutagenic aromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity, J Med Chem 34 (1991), 786-797.
[30] N. Wale and G. Karypis, Comparison of descriptor spaces for chemical compound retrieval and classification, In IEEE International Conference on Data Mining, 2006.
[31] P. Dobson and A. J. Doig, Distinguishing enzyme structures from non-enzymes without alignments, J Mol Biol 330(4) (2003), 771-783.
[32] C. C. Bilgin, P. Bullough, G. E. Plopper, and B. Yener, Ecm-aware cell-graph mining for bone tissue modeling and classification, Data Mining Knowl Discov 20 (2010), 416-438.
[33] C. Demir, S. H. Gultekin, and B. Yener, Learning the topological properties of brain tumors, IEEE/ACM Trans Comput Biol Bioinform 2(3) (2005), 262-270.
[34] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Gene selection for cancer classification using support vector machines, Mach Learn 46(1) (2002), 389-422. · Zbl 0998.68111
[35] L. Lov´asz and K. Vesztergombi, Geometric representations of graphs, Paul Erdos and his Mathematics, 1999.
[36] U. Brandes and D. Fleischer, Centrality measures based on current flow, STACS 2005, 2005, 533-544. · Zbl 1118.90302
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.