ANF swMATH ID: 12276 Software Authors: C.R. Palmer, P.B. Gibbons, C. Faloutsos Description: ANF: a fast and scalable tool for data mining in massive graphs. Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can be represented as a graph. This work presents a data mining tool, called ANF, that can quickly answer a number of interesting questions on graph-represented data, such as the following. How robust is the Internet to failures? What are the most influential database papers? Are there gender differences in movie appearance patterns? At its core, ANF is based on a fast and memory-efficient approach for approximating the complete ”neighbourhood function” for a graph. For the Internet graph (268K nodes), ANF’s highly-accurate approximation is more than 700 times faster than the exact computation. This reduces the running time from nearly a day to a matter of a minute or two, allowing users to perform ad hoc drill-down tasks and to repeatedly answer questions about changing data sources. To enable this drill-down, ANF employs new techniques for approximating neighbourhood-type functions for graphs with distinguished nodes and/or edges. When compared to the best existing approximation, ANF’s approach is both faster and more accurate, given the same resources. Additionally, unlike previous approaches, ANF scales gracefully to handle disk resident graphs. Finally, we present some of our results from mining large graphs using ANF. Homepage: http://dl.acm.org/citation.cfm?id=775059 Related Software: HyperLogLog; gSpan; WebGraph; KronFit; SNAP; CloseGraph; CMAR; UCI-ml; nauty; GraphChi; UbiCrawler; Graphs; MetExplore; Pajek datasets; TAG; Pastry; Chord Cited in: 12 Publications all top 5 Cited by 45 Authors 2 Crescenzi, Pierluigi 2 Grossi, Roberto 2 Lanzi, Leonardo 2 Marino, Andrea 1 Bawa, Mayank 1 Cami, Aurel 1 Carpi, Laura C. 1 Chakrabarti, Deepayan 1 Cheng, Hong 1 Csalogány, Károly 1 Deo, Narsingh 1 Faloutsos, Christos 1 Fogaras, Dániel 1 Frery, Alejandro C. 1 Garcia-Molina, Hector 1 Ghahramani, Zoubin 1 Gionis, Aristides 1 Guan, Xiaohong 1 Habib, Michel A. 1 Huang, Xin 1 Kleinberg, Jon Michael 1 Kosters, Walter A. 1 Leskovec, Jure 1 Li, Ronghua 1 Lui, John C. S. 1 Motwani, Rajeev 1 Nadarajan, Rethnaswamy 1 Nirmala, P. 1 Pardalos, Panos M. 1 Rácz, Balázs 1 Ravetti, Martín Gómez 1 Roddick, John F. 1 Rosso, Osvaldo A. 1 Sarlós, Tamás 1 Schieber, Tiago Alves 1 Shang, Zechao 1 Takes, Frank W. 1 Thilaga, M. 1 Towsley, Donald Fred 1 Vijayalakshmi, R. 1 Wang, Pinghui 1 Woodruff, David P. 1 Yu, Jeffrey Xu 1 Zhang, Qin 1 Zhao, Junzhou all top 5 Cited in 10 Serials 2 Information Sciences 1 Physics Letters. A 1 Journal of Computer and System Sciences 1 Networks 1 Theoretical Computer Science 1 Distributed Computing 1 Journal of Graph Algorithms and Applications 1 Journal of Machine Learning Research (JMLR) 1 Internet Mathematics 1 Algorithms Cited in 4 Fields 8 Computer science (68-XX) 7 Combinatorics (05-XX) 2 Game theory, economics, finance, and other social and behavioral sciences (91-XX) 1 Information and communication theory, circuits (94-XX) Citations by Year