×

iTri: index-based triangle listing in massive graphs. (English) Zbl 1394.68128

Summary: Triangle listing is a basic operator when dealing with many graph problems. However, in-memory algorithms do not work well with recently developed massive graphs such as social networks because these graphs cannot be accommodated in the memory. Thus, external memory-based algorithms have been proposed recently, but these approaches still require frequent multiple scans of the whole graph on the disk and large volumes of calculations are performed that involve the whole graph during every iteration. In this study, we propose a novel index-based method for listing triangles in massive graphs. First, we present new notions for the vertex range index and potential cone vertex index. Next, we propose an index join-based triangle listing algorithm. Our method accesses the indexed data asynchronously and joins them to list triangles using a multi-threaded parallel processing technique. Based on experiments, we demonstrate that our algorithm outperforms the state-of-the-art solution methods by three to eight times in terms of the wall clock time.

MSC:

68P20 Information storage and retrieval of data
05C85 Graph algorithms (graph-theoretic aspects)
68R10 Graph theory (including graph drawing) in computer science
68W10 Parallel algorithms in computer science
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aggarwal, A.; Vitter, S.; Jeffrey, S. V, The input/output complexity of sorting and related problems, Commun. ACM, 31, 9, 1116-1127, (1988)
[2] Batagelj, V.; Zaveršnik, M., Short cycle connectivity, Discret. Math., 307, 3, 310-318, (2007) · Zbl 1111.05054
[3] Becchetti, L.; Boldi, P.; Castillo, C.; Gionis, A., Efficient semi-streaming algorithms for local triangle counting in massive graphs, Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 16-24, (2008), ACM
[4] Björklund, A.; Pagh, R.; Williams, V. V.; Zwick, U., Listing triangles, Automata, Languages, and Programming, 223-234, (2014), Springer · Zbl 1410.05191
[5] Buriol, L. S.; Frahling, G.; Leonardi, S.; Marchetti-Spaccamela, A.; Sohler, C., Counting triangles in data streams, Proceedings of the Twenty-fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS ’06), 253-262, (2006), ACM New York, NY, USA
[6] Chiba, N.; Nishizeki, T., Arboricity and subgraph listing algorithms, SIAM J. Comput., 14, 1, 210-223, (1985) · Zbl 0572.68053
[7] Chu, S.; Cheng, J., Triangle listing in massive networks and its applications, Proceedings of the Seventeenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’11), 672-680, (2011), ACM New York, NY, USA
[8] Chu, S.; Cheng, J., Triangle listing in massive networks, ACM Trans. Knowl. Discov. Data, 6, 4, 17:1-17:32, (2012)
[9] DeWitt, D. J.; Naughton, J. F.; Burger, J., Nested loops revisited, Proceedings of the Second International Conference on Parallel and Distributed Information Systems, 230-242, (1993), IEEE Computer Society Washington, DC, USA
[10] Eppstein, D.; Goodrich, M. T.; Strash, D.; Trott, L., Extended dynamic subgraph statistics using h-index parameterized data structures, Proceedings of the Fourth International Conference on Combinatorial Optimization and Applications - Volume Part I (COCOA ’10), 128-141, (2010), Springer-Verlag Berlin, Heidelberg · Zbl 1310.68161
[11] Facebook, Facebook statistics, Jun 2015, url: http://www.statisticbrain.com/facebook-statistics.
[12] Fomin, F. V.; Todinca, I.; Villanger, Y., Large induced subgraphs via triangulations and CMSO, SIAM J. Comput., 44, 1, 54-87, (2015) · Zbl 1357.05144
[13] Graefe, G., Query evaluation techniques for large databases, ACM Comput. Surv., 25, 2, 73-169, (1993)
[14] Hu, X.; Tao, Y.; Chung, C.-W., Massive graph triangulation, Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD ’13), 325-336, (2013), ACM New York, NY, USA
[15] Hu, X.; Tao, Y.; Chung, C.-W., I/o-efficient algorithms on triangle listing and counting, ACM Trans. Database Syst., 39, 27:1-27:30, (2014) · Zbl 1474.68226
[16] Itai, A.; Rodeh, M., Finding a minimum circuit in a graph, Proceedings of the Ninth Annual ACM Symposium on Theory of Computing (STOC ’77), 1-10, (1977), ACM New York, NY, USA
[17] Jha, M.; Seshadhri, C.; Pinar, A., A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox, ACM Trans. Knowl. Discov. Data (TKDD), 9, 3, 15, (2015)
[18] Kim, J.; Han, W.-S.; Lee, S.; Park, K.; Yu, H., OPT: A new framework for overlapped and parallel triangulation in large-scale graphs, Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD ’14), 637-648, (2014), ACM New York, NY, USA
[19] Korth, H. F.; Silberschatz, A., Database System Concepts, (1986), McGraw-Hill, Inc. New York, NY, USA · Zbl 0597.68068
[20] Latapy, M., Main-memory triangle computations for very large (sparse (power-law)) graphs, Theor. Comput. Sci., 407, 1-3, 458-473, (2008) · Zbl 1152.68045
[21] Lim, Y.; Kang, U., MASCOT: memory-efficient and accurate sampling for counting local triangles in graph streams, Proceedings of the Twenty-first ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 685-694, (2015), ACM
[22] May, J. M., Parallel I/O for high performance computing, (2001), Morgan Kaufmann
[23] Mishra, P.; Eich, M. H., Join processing in relational databases, ACM Comput. Surv., 24, 1, 63-113, (1992)
[24] Newman, M. E.; Watts, D. J.; Strogatz, S. H., Random graph models of social networks, Proc. Natl. Acad. Sci. United States Am., 99, Suppl 1, 2566-2572, (2002) · Zbl 1114.91362
[25] Pagh, R.; Silvestri, F., The input/output complexity of triangle enumeration, Proceedings of the Thirty-third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS ’14), 224-233, (2014), ACM New York, NY, USA
[26] Park, H.-M.; Chung, C.-W., An efficient mapreduce algorithm for counting triangles in a very large graph, Proceedings of the Twenty-second ACM International Conference on Information and Knowledge Management, 539-548, (2013), ACM
[27] Park, H.-M.; Silvestri, F.; Kang, U.; Pagh, R., Mapreduce triangle enumeration with guarantees, Proceedings of the Twenty-third ACM International Conference on Information and Knowledge Management, 1739-1748, (2014), ACM
[28] Pavan, A.; Tangwongan, K.; Tirthapura, S., Parallel and distributed triangle counting on graph streams, Technical Report, (2013), IBM
[29] Schank, T.; Wagner, D., Finding, counting and listing all triangles in large graphs, an experimental study, Proceedings of the Fourth International Conference on Experimental and Efficient Algorithms (WEA ’05), 606-609, (2005), Springer-Verlag Berlin, Heidelberg · Zbl 1121.68360
[30] Shun, J.; Tangwongsan, K., Multicore triangle computations without tuning, Proceedings of the IEEE International Conference on Data Engineering (ICDE), (2015)
[31] Tangwongsan, K.; Pavan, A.; Tirthapura, S., Parallel triangle counting in massive streaming graphs, Proceedings of the Twenty-second ACM international Conference on Information and Knowledge Management, 781-786, (2013), ACM
[32] Twitter, Twitter statistics, Mar 2015, url: http://www.statisticbrain.com/twitter-statistics.
[33] Vitter, J. S., Algorithms and data structures for external memory, Found. Trends Theor. Comput. Sci., 2, 4, 305-474, (2008) · Zbl 1244.68007
[34] Wang, N.; Zhang, J.; Tan, K.-L.; Tung, A. K., On triangulation-based dense neighborhood graph discovery, Proc. VLDB Endow., 4, 2, 58-68, (2010)
[35] Wasserman, S., Social Network Analysis: Methods and Applications, vol. 8, (1994), Cambridge University Press
[36] Watts, D. J.; Strogatz, S. H., Collective dynamics of small-world networks, Nature, 393, 6684, 440-442, (1998) · Zbl 1368.05139
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.