×

Fregel: a functional domain-specific language for vertex-centric large-scale graph processing. (English) Zbl 07492703

Summary: The vertex-centric programming model is now widely used for processing large graphs. User-defined vertex programs are executed in parallel over every vertex of a graph, but the imperative and explicit message-passing style of existing systems makes defining a vertex program unintuitive and difficult. This article presents Fregel, a purely functional domain-specific language for processing large graphs and describes its model, design, and implementation. Fregel is a subset of Haskell, so Haskell tools can be used to test and debug Fregel programs. The vertex-centric computation is abstracted using compositional programming that uses second-order functions on graphs provided by Fregel. A Fregel program can be compiled into imperative programs for use in the Giraph and Pregel+ vertex-centric frameworks. Fregel’s functional nature without side effects enables various transformations and optimizations during the compilation process. Thus, the programmer is freed from the burden of program optimization, which is manually done for existing imperative systems. Experimental results for typical examples demonstrated that the compiled code can be executed with reasonable and promising performance.

MSC:

68N18 Functional programming and lambda calculus
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Bae, S. & Howe, B. (2015) Gossipmap: A distributed community detection algorithm for billion-edge directed graphs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, Austin, TX, USA, November 15-20, 2015, pp. 27:1-27:12.
[2] Bahr, P. & Axelsson, E. (2017) Generalising tree traversals and tree transformations to dags: Exploiting sharing without the pain. Sci. Comput. Program.137, 63-97.
[3] Bu, Y., Howe, B., Balazinska, M. & Ernst, M. D. (2012) The Haloop approach to large-scale iterative data analysis. VLDB J.21(2), 169-190.
[4] Capota, M., Hegeman, T., Iosup, A., Prat-Pérez, A., Erling, O. & Boncz, P. A. (2015) Graphalytics: A big data benchmark for graph-processing platforms. In Proceedings of the Third International Workshop on Graph Data Management Experiences and Systems, GRADES 2015, Melbourne, VIC, Australia, May 31-June 4, 2015, pp. 7:1-7:6.
[5] Caviness, B. F. & Johnson, J. R. (eds). (1998) Quantifier Elimination and Cylindrical Algebraic Decomposition. Springer Vienna. · Zbl 0906.03033
[6] Cruz, F., Rocha, R. & Goldstein, S. C. (2016) Declarative coordination of graph-based parallel programs. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, March 12-16, 2016, pp. 4:1-4:12.
[7] Dathathri, R., Gill, G., Hoang, L., Dang, H., Brooks, A., Dryden, N., Snir, M. & Pingali, K. (2018) Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018, Foster, J. S. & Grossman, D. (eds). ACM, pp. 752-768.
[8] De Moura, L. M. & Bjørner, N. (2011) Satisfiability modulo theories: Introduction and applications. Commun. ACM54(9), 69-77.
[9] Emoto, K. & Sadahira, F. (2020) A DSL for graph parallel programming with vertex subsets. J. Supercomput.76(7), 4998-5015.
[10] Emoto, K., Matsuzaki, K., Hu, Z., Morihata, A. & Iwasaki, H. (2016) Think like a vertex, behave like a function! A functional DSL for vertex-centric big graph processing. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, ICFP 2016, Nara, Japan, September 18-22, 2016, pp. 200-213.
[11] Erwig, M. (1997) Functional programming with graphs. In Proceedings of the 1997 ACM SIGPLAN International Conference on Functional Programming, ICFP 1997, Peyton Jones, S. L., Tofte, M. & Berman, A. M. (eds). Amsterdam, The Netherlands, June 9-11, 1997. ACM, pp. 52-65. · Zbl 1369.68095
[12] Erwig, M. (2001) Inductive graphs and functional graph algorithms. J. Funct. Program.11(5), 467-492. · Zbl 0994.68032
[13] Fegaras, L. & Sheard, T. (1996) Revisiting catamorphisms over datatypes with embedded functions (or, programs from outer space). In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. POPL 1996. ACM, pp. 284-294.
[14] Gao, Y., Zhou, W., Han, J., Meng, D., Zhang, Z. & Xu, Z. (2015) An evaluation and analysis of graph processing frameworks on five key issues. In Proceedings of the 12th ACM International Conference on Computing Frontiers, CF 2015, Ischia, Italy, May 18-21, 2015, pp. 11:1-11:8.
[15] Gonzalez, J. E., Low, Y., Gu, H., Bickson, D. & Guestrin, C. (2012) Powergraph: Distributed graph-parallel computation on natural graphs. In 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8-10, 2012, pp. 17-30.
[16] Gonzalez, J. E., Xin, R. S., Dave, A., Crankshaw, D., Franklin, M. J. & Stoica, I. (2014) Graphx: Graph processing in a distributed dataflow framework. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2014, Broomfield, CO, USA, October 6-8, 2014, pp. 599-613.
[17] Guo, Y., Biczak, M., Varbanescu, A. L., Iosup, A., Martella, C. & Willke, T. L. (2014) How well do graph-processing platforms perform? An empirical performance evaluation and analysis. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS 2014, Phoenix, AZ, USA, May 19-23, 2014, pp. 395-404.
[18] Hamana, M. (2010) Initial algebra semantics for cyclic sharing tree structures. Log. Methods Comput. Sci.6(3), 1-23. · Zbl 1209.68371
[19] Han, M. & Daudjee, K. (2015) Giraph unchained: Barrierless asynchronous parallel execution in pregel-like graph processing systems. PVLDB8(9), 950-961.
[20] Han, M., Daudjee, K., Ammar, K., Özsu, M. T., Wang, X. & Jin, T. (2014) An experimental comparison of pregel-like graph processing systems. PVLDB7(12), 1047-1058.
[21] Hidaka, S., Asada, K., Hu, Z., Kato, H. & Nakano, K. (2013) Structural recursion for querying ordered graphs. In ACM SIGPLAN International Conference on Functional Programming, ICFP 2013, Boston, MA, USA - September 25-27, 2013, pp. 305-318. · Zbl 1323.68241
[22] Hong, S., Chafi, H., Sedlar, E. & Olukotun, K. (2012) Green-marl: A DSL for easy and efficient graph analysis. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2012, London, UK, March 3-7, 2012, pp. 349-362.
[23] Kalavri, V., Vlassov, V. & Haridi, S. (2018) High-level programming abstractions for distributed graph processing. IEEE Trans. Knowl. Data Eng. 30(2), 305-324.
[24] Kang, U., Tong, H., Sun, J., Lin, C. & Faloutsos, C. (2012) GBASE: An efficient analysis platform for large graphs. VLDB J.21(5), 637-650.
[25] Kang, U., Tsourakakis, C. E. & Faloutsos, C. (2011) PEGASUS: Mining peta-scale graphs. Knowl. Inf. Syst.27(2), 303-325.
[26] Kato, N. & Iwasaki, H. (2019) Eliminating unnecessary communications in the vertex-centric graph processing by the fregel compiler. Comput. Software36(2), 2_28-2_46. In Japanese.
[27] Khan, A. (2017) Vertex-centric graph processing: Good, bad, and the ugly. In Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21-24, 2017, pp. 438-441.
[28] Khan, A. & Elnikety, S. (2014) Systems for big-graphs. PVLDB7(13), 1709-1710.
[29] Liu, S. & Khan, A. (2018) An empirical analysis on expressibility of vertex centric graph processing paradigm. In IEEE International Conference on Big Data, Big Data 2018, Seattle, WA, USA, December 10-13, 2018, pp. 242-251.
[30] Liu, Y., Zhou, C., Gao, J. & Fan, Z. (2016) Giraphasync: Supporting online and offline graph processing via adaptive asynchronous message processing. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, IN, USA, October 24-28, 2016, pp. 479-488.
[31] Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C. & Hellerstein, J. M. (2012) Distributed graphlab: A framework for machine learning in the cloud. PVLDB5(8), 716-727.
[32] Lu, Y., Cheng, J., Yan, D. & Wu, H. (2014) Large-scale distributed graph computing systems: An experimental evaluation. PVLDB8(3), 281-292.
[33] Malewicz, G., Austern, M. H., Bik, A. J. C., Dehnert, J. C., Horn, I., Leiser, N. & Czajkowski, G. (2010) Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, pp. 135-146.
[34] Mccune, R. R., Weninger, T. & Madey, G. (2015) Thinking like a vertex: A survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Comput. Surv.48(2), 25:1-25:39.
[35] Morihata, A., Emoto, K., Matsuzaki, K., Hu, Z. & Iwasaki, H. (2018) Optimizing declarative parallel distributed graph processing by using constraint solvers. In Functional and Logic Programming - 14th International Symposium, FLOPS 2018, Nagoya, Japan, May 9-11, 2018, pp. 166-181. · Zbl 06900731
[36] Nguyen, D., Lenharth, A. & Pingali, K. (2013) A lightweight infrastructure for graph analytics. In ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP 2013, Farmington, PA, USA, November 3-6, 2013, pp. 456-471.
[37] Oliveira, B. C. D. S. & Cook, W. R. (2012) Functional programming with structured graphs. In ACM SIGPLAN International Conference on Functional Programming, ICFP 2012, Copenhagen, Denmark, September 9-15, 2012, Thiemann, P. & Findler, R. B. (eds). ACM, pp. 77-88. · Zbl 1291.68127
[38] Prountzos, D., Manevich, R. & Pingali, K. (2012) Elixir: A system for synthesizing concurrent graph programs. In Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2012, part of SPLASH 2012, Tucson, AZ, USA, October 21-25, 2012, pp. 375-394.
[39] Prountzos, D., Manevich, R. & Pingali, K. (2015) Synthesizing parallel graph programs via automated planning. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2015, Portland, OR, USA, June 15-17, 2015, pp. 533-544.
[40] Quamar, A. & Deshpande, A. (2016) Nscalespark: Subgraph-centric graph analytics on apache spark. In Proceedings of the 1st ACM SIGMOD Workshop on Network Data Analytics, NDA@SIGMOD 2016, San Francisco, California, USA, July 1, 2016, pp. 5:1-5:8.
[41] Quamar, A., Deshpande, A. & Lin, J. J. (2014) Nscale: Neighborhood-centric analytics on large graphs. PVLDB7(13), 1673-1676.
[42] Quamar, A., Deshpande, A. & Lin, J. J. (2016) Nscale: Neighborhood-centric large-scale graph analytics in the cloud. VLDB J.25(2), 125-150.
[43] Salihoglu, S. & Widom, J. (2013) GPS: A graph processing system. In Conference on Scientific and Statistical Database Management, SSDBM 2013, Baltimore, MD, USA, July 29-31, 2013, pp. 22:1-22:12.
[44] Salihoglu, S. & Widom, J. (2014) Optimizing graph algorithms on pregel-like systems. PVLDB7(7), 577-588.
[45] Satish, N., Sundaram, N., Patwary, M. M. A., Seo, J., Park, J., Hassaan, M. A., Sengupta, S., Yin, Z. & Dubey, P. (2014) Navigating the maze of graph analytics frameworks using massive graph datasets. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, pp. 979-990.
[46] Sengupta, D., Song, S. L., Agarwal, K. & Schwan, K. (2015) Graphreduce: Processing large-scale graphs on accelerator-based systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, Austin, TX, USA, November 15-20, 2015, pp. 28:1-28:12.
[47] Seo, J., Park, J., Shin, J. & Lam, M. S. (2013) Distributed socialite: A datalog-based language for large-scale graph analysis. PVLDB6(14), 1906-1917.
[48] Simmhan, Y., Kumbhare, A. G., Wickramaarachchi, C., Nagarkar, S., Ravi, S., Raghavendra, C. S. & Prasanna, V. K. (2014) Goffish: A sub-graph centric framework for large-scale graph analytics. In Euro-Par 2014 Parallel Processing - 20th International Conference, Porto, Portugal, August 25-29, 2014. Proceedings, pp. 451-462.
[49] Song, S., Liu, X., Wu, Q., Gerstlauer, A., Li, T. & John, L. K. (2018) Start late or finish early: A distributed graph processing system with redundancy reduction. Proc. VLDB Endow. 12(2), 154-168.
[50] Tian, Y., Balmin, A., Corsten, S. A., Tatikonda, S. & Mcpherson, J. (2013) From “think like a vertex” to “think like a graph”. PVLDB7(3), 193-204.
[51] Valiant, L. G. (1990) A bridging model for parallel computation. Commun. ACM33(8), 103-111.
[52] Verma, S., Leslie, L. M., Shin, Y. & Gupta, I. (2017) An experimental comparison of partitioning strategies in distributed graph processing. PVLDB10(5), 493-504.
[53] Wang, G., Xie, W., Demers, A. J. & Gehrke, J. (2013) Asynchronous large-scale graph processing made easy. In Sixth Biennial Conference on Innovative Data Systems Research, CIDR 2013, Asilomar, CA, USA, January 6-9, 2013, Online Proceedings.
[54] Watts, D. J. & Strogatz, S. H. (1998) Collective dynamics of ‘small-world’ networks. Nature393(6684), 440-442. · Zbl 1368.05139
[55] Xie, C., Chen, R., Guan, H., Zang, B. & Chen, H. (2015) SYNC or ASYNC: Time to fuse for distributed graph-parallel computation. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, San Francisco, CA, USA, February 7-11, 2015, pp. 194-204.
[56] Yan, D., Bu, Y., Tian, Y., Deshpande, A. & Cheng, J. (2016) Big graph analytics systems. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD 2016, San Francisco, CA, USA, June 26-July 01, 2016, pp. 2241-2243.
[57] Yan, D., Bu, Y., Tian, Y. & Deshpande, A. (2017) Big graph analytics platforms. Found. Trends Databases7(1-2), 1-195.
[58] Yan, D., Cheng, J., Lu, Y. & Ng, W. (2014a) Blogel: A block-centric framework for distributed computation on real-world graphs. PVLDB7(14), 1981-1992.
[59] Yan, D., Cheng, J., Lu, Y. & Ng, W. (2015) Effective techniques for message reduction and load balancing in distributed graph computation. In Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18-22, 2015, pp. 1307-1317.
[60] Yan, D., Cheng, J., Xing, K., Lu, Y., Ng, W. & Bu, Y. (2014b) Pregel algorithms for graph connectivity problems with performance guarantees. PVLDB7(14), 1821-1832.
[61] Yuan, P., Xie, C., Liu, L. & Jin, H. (2016) Pathgraph: A path centric graph processing system. IEEE Trans. Parallel Distrib. Syst.27(10), 2998-3012.
[62] Zhou, J., Xu, C., Chen, X., Wang, C. & Zhou, X. (2017) Mermaid: Integrating vertex-centric with edge-centric for real-world graph processing. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017, Madrid, Spain, May 14-17, 2017, pp. 780-783.
[63] Zhuo, Y., Chen, J., Luo, Q., Wang, Y., Yang, H., Qian, D. & Qian, X. (2020) Symplegraph: Distributed graph processing with precise loop-carried dependency guarantee. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Donaldson, A. F. & Torlak, E. (eds). ACM, pp. 592-607.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.