×

Scaling-up reasoning and advanced analytics on BigData. (English) Zbl 1452.68064

Summary: BigDatalog is an extension of Datalog that achieves performance and scalability on both Apache Spark and multicore systems to the point that its graph analytics outperform those written in GraphX. Looking back, we see how this realizes the ambitious goal pursued by deductive database researchers beginning 40 years ago: this is the goal of combining the rigor and power of logic in expressing queries and reasoning with the performance and scalability by which relational databases managed BigData. This goal led to Datalog which is based on Horn Clauses like Prolog but employs implementation techniques, such as semi-naïve fixpoint and magic sets, that extend the bottom-up computation model of relational systems, and thus obtain the performance and scalability that relational systems had achieved, as far back as the 80s, using data-parallelization on shared-nothing architectures. But this goal proved difficult to achieve because of major issues at (i) the language level and (ii) at the system level. The paper describes how (i) was addressed by simple rules under which the fixpoint semantics extends to programs using count, sum and extrema in recursion, and (ii) was tamed by parallel compilation techniques that achieve scalability on multicore systems and Apache Spark. This paper is under consideration for acceptance in Theory and Practice of Logic Programming.

MSC:

68P15 Database theory
68M14 Distributed systems
68N17 Logic programming
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Abiteboul, S.; Hull, R., (1988)
[2] Abiteboul, S.; Hull, R.; Vianu, V., Foundations of Databases: The Logical Level, (1995), Addison-Wesley Longman Publishing: Addison-Wesley Longman Publishing, Boston, MA, USA
[3] Agrawal, R., (1994)
[4] Ameloot, T. J.; Neven, F.; Van Den Bussche, J., (2011)
[5] Aref, M., (2015)
[6] Armbrust, M.; Xin, R. S.; Lian, C.; Huai, Y.; Liu, D.; Bradley, J. K.; Meng, X.; Kaftan, T.; Franklin, M. J.; Ghodsi, A.; Zaharia, M., (2015)
[7] Arni, F.; Ong, K.; Tsur, S.; Wang, H.; Zaniolo, C., The deductive database system LDL++, Theory and Practice of Logic Programming, 3, 1, 61-94, (2003) · Zbl 1087.68559 · doi:10.1017/S1471068402001515
[8] Bell, D. A.; Shao, J.; Hull, M. E. C., A pipelined strategy for processing recursive queries in parallel, Data & Knowledge Engineering, 6, 5, 367-391, (1991) · doi:10.1016/0169-023X(91)90008-L
[9] Borkar, V. R., Declarative systems for large-scale machine learning, IEEE Data Engineering Bulletin, 35, 2, 24-32, (2012)
[10] Borkar, V. R.; Carey, M. J.; Grover, R.; Onose, N.; Vernica, R., (2011)
[11] Bu, Y.; Borkar, V. R.; Carey, M. J.; Rosen, J.; Polyzotis, N.; Condie, T.; Weimer, M.; Ramakrishnan, R., (2012)
[12] Cardoso, J. C.; Baquero, C.; Almeida, P. S., (2009)
[13] Chimenti, D.; O’Hare, A. B.; Krishnamurthy, R.; Tsur, S.; West, C.; Zaniolo, C., An overview of the LDL system, IEEE Data Engineering Bulletin, 10, 4, 52-62, (1987)
[14] Cohen, S.; Wolfson, O., (1989)
[15] Condie, T.; Chu, D.; Hellerstein, J. M.; Maniatis, P., Evita raced: Metacompilation for declarative networks, Proceedings of the VLDB Endowment, 1, 1, 1153-1165, (2008) · doi:10.14778/1453856.1453978
[16] Conway, N.; Marczak, W. R.; Alvaro, P.; Hellerstein, J. M.; Maier, D., (2012)
[17] Das, A.; Zaniolo, C., (2016)
[18] De Kergommeaux, J. C.; Codognet, P., Parallel logic programming systems, ACM Computing Surveys, 26, 3, 295-336, (1994) · doi:10.1145/185403.185453
[19] Dean, J.; Ghemawat, S., (2004)
[20] Erdem, E.; Gelfond, M.; Leone, N., Applications of answer set programming, AI Magazine, 37, 3, 53-68, (2016) · doi:10.1609/aimag.v37i3.2678
[21] Faber, W.; Pfeifer, G.; Leone, N., Semantics and complexity of recursive aggregates in answer set programming, Artificial Intelligence, 175, 1, 278-298, (2011) · Zbl 1216.68263 · doi:10.1016/j.artint.2010.04.002
[22] Fang, M.; Shivakumar, N.; Garcia-Molina, H.; Motwani, R.; Ullman, J. D., (1998)
[23] Ganguly, S.; Greco, S.; Zaniolo, C., (1991)
[24] Ganguly, S.; Greco, S.; Zaniolo, C., Extrema predicates in deductive databases, Journal of Computer and System Sciences, 51, 2, 244-259, (1995) · Zbl 0831.68041 · doi:10.1006/jcss.1995.1064
[25] Ganguly, S.; Silberschatz, A.; Tsur, S., (1990)
[26] Ganguly, S.; Silberschatz, A.; Tsur, S., Parallel bottom-up processing of datalog queries, Journal of Logic Programming, 14, 1, 101-126, (1992) · Zbl 0772.68025 · doi:10.1016/0743-1066(92)90048-8
[27] Gebser, M.; Kaminski, R.; Kaufmann, B.; Schaub, T., (2014)
[28] Gelfond, M.; Zhang, Y., Vicious circle principle and logic programs with aggregates, Theory and Practice of Logic Programming, 14, 4-5, 587-601, (2014) · Zbl 1309.68032 · doi:10.1017/S1471068414000222
[29] Giacometti, A.; Li, D. H.; Marcel, P.; Soulet, A., 20 years of pattern mining: A bibliometric survey, SIGKDD Explorations Newsletter, 15, 1, 41-50, (2014) · doi:10.1145/2594473.2594480
[30] Giannotti, F.; Manco, G., (2002)
[31] Giannotti, F.; Manco, G.; Turini, F., Specifying mining algorithms with iterative user-defined aggregates, IEEE Transactions on Knowledge and Data Engineering, 16, 10, 1232-1246, (2004) · doi:10.1109/TKDE.2004.64
[32] Gonzalez, J. E.; Xin, R. S.; Dave, A.; Crankshaw, D.; Franklin, M. J.; Stoica, I., (2014)
[33] Greco, S.; Zaniolo, C.; Ganguly, S., (1992)
[34] Gupta, G.; Pontelli, E.; Ali, K. A.; Carlsson, M.; Hermenegildo, M. V., Parallel execution of prolog programs: A survey, ACM Transactions on Programming Languages and Systems, 23, 4, 472-602, (2001) · doi:10.1145/504083.504085
[35] Halperin, D.; De Almeida, V. T.; Choo, L. L.; Chu, S.; Koutris, P.; Moritz, D.; Ortiz, J.; Ruamviboonsuk, V.; Wang, J.; Whitaker, A.; Xu, S.; Balazinska, M.; Howe, B.; Suciu, D., (2014)
[36] Han, J.; Pei, J.; Yin, Y., (2000)
[37] Hu, T.; Sung, S. Y.; Xiong, H.; Fu, Q., Discovery of maximum length frequent itemsets, Information Sciences, 178, 1, 69-87, (2008) · doi:10.1016/j.ins.2007.08.006
[38] Interlandi, M.; Tanca, L., (2015)
[39] Kang, U.; Tsourakakis, C. E.; Appel, A. P.; Faloutsos, C.; Leskovec, J., Hadi: Mining radii of large graphs, ACM Transactions on Knowledge Discovery from Data, 5, 2, 8:1-8:24, (2011)
[40] Kemp, D. B.; Stuckey, P. J., (1991)
[41] Kowalski, R. A., Algorithm = logic + control, Communications of the ACM, 22, 7, 424-436, (1979) · Zbl 0404.68010 · doi:10.1145/359131.359136
[42] Leone, N., The DLV system for knowledge representation and reasoning, Transactions on Computational Logic, 7, 3, 499-562, (2006) · Zbl 1367.68308 · doi:10.1145/1149114.1149117
[43] Lewis, D. D., (1998)
[44] Lifschitz, S.; Vianu, V., A probabilistic view of datalog parallelization, Theoretical Computer Science, 190, 2, 211-239, (1998) · Zbl 0893.68051 · doi:10.1016/S0304-3975(97)00091-1
[45] Loo, B. T.; Condie, T.; Garofalakis, M. N.; Gay, D. E.; Hellerstein, J. M.; Maniatis, P.; Ramakrishnan, R.; Roscoe, T.; Stoica, I., (2006)
[46] Loo, B. T.; Condie, T.; Hellerstein, J. M.; Maniatis, P.; Roscoe, T.; Stoica, I., (2005)
[47] Martínez-Angeles, C. A.; Dutra, I.; Costa, V. S., Declarative Programming and Knowledge Management, A datalog engine for GPUs, 152-168, (2014), Springer
[48] Martínez-Angeles, C. A.; Wu, H.; Dutra, I.; Costa, V. S.; Buenabad-Chávez, J., Relational learning with GPUs: Accelerating rule coverage, International Journal of Parallel Programming, 44, 3, 663-685, (2016) · doi:10.1007/s10766-015-0364-7
[49] Matula, D. W.; Beck, L. L., Smallest-last ordering and clustering and graph coloring algorithms, Journal of the ACM, 30, 3, 417-427, (1983) · Zbl 0628.68054 · doi:10.1145/2402.322385
[50] Mazuran, M.; Serra, E.; Zaniolo, C., A declarative extension of horn clauses, and its significance for datalog and its applications, Theory and Practice of Logic Programming, 13, 4, 609-623, (2013) · Zbl 1286.68053 · doi:10.1017/S1471068413000380
[51] Mazuran, M.; Serra, E.; Zaniolo, C., Extending the power of datalog recursion, The VLDB Journal, 22, 4, 471-493, (2013) · doi:10.1007/s00778-012-0299-1
[52] Minker, J.; Seipel, D.; Zaniolo, C., Computational Logic, Logic and databases: A history of deductive databases, 571-627, (2014), Elsevier
[53] Mitchell, T. M., Machine Learning, (1997), McGraw-Hill: McGraw-Hill, Boston, MA · Zbl 0913.68167
[54] Morris, K. A.; Ullman, J. D.; Gelder, A. V., (1986)
[55] Motik, B.; Nenov, Y.; Piro, R.; Horrocks, I.; Olteanu, D., (2014)
[56] Mumick, I. S.; Pirahesh, H.; Ramakrishnan, R., (1990)
[57] Murray, D. G.; Mcsherry, F.; Isaacs, R.; Isard, M.; Barham, P.; Abadi, M., (2013)
[58] Mutharaju, R.; Maier, F.; Hitzler, P., (2010)
[59] Pelov, N.; Denecker, M.; Bruynooghe, M., Well-founded and stable semantics of logic programs with aggregates, Theory and Practice of Logic Programming, 7, 3, 301-353, (2007) · Zbl 1111.68070 · doi:10.1017/S1471068406002973
[60] Przymusinski, T. C., (1988)
[61] Quinlan, J. R., Induction of decision trees, Machine Learning, 1, 1, 81-106, (1986)
[62] Ramakrishnan, R.; Srivastava, D.; Sudarshan, S., (1992)
[63] Ross, K. A.; Sagiv, Y., (1992)
[64] Seib, J.; Lausen, G., (1991)
[65] Seo, J.; Guo, S.; Lam, M. S., (2013)
[66] Seo, J.; Park, J.; Shin, J.; Lam, M. S., Distributed socialite: A datalog-based language for large-scale graph analysis, Proceedings of the VLDB Endowment, 6, 14, 1906-1917, (2013) · doi:10.14778/2556549.2556572
[67] Shin, K.; Eliassi-Rad, T.; Faloutsos, C., (2016)
[68] Shkapsky, A.; Yang, M.; Interlandi, M.; Chiu, H.; Condie, T.; Zaniolo, C., (2016)
[69] Shkapsky, A.; Zeng, K.; Zaniolo, C., Graph queries in a next-generation datalog system, Proceedings of the VLDB Endowment, 6, 12, 1258-1261, (2013) · doi:10.14778/2536274.2536290
[70] Simons, P.; Niemelä, I.; Soininen, T., Extending and implementing the stable model semantics, Artificial Intelligence, 138, 1, 181-234, (2002) · Zbl 0995.68021 · doi:10.1016/S0004-3702(02)00187-X
[71] Son, T. C.; Pontelli, E., A constructive semantic characterization of aggregates in answer set programming, Theory and Practice of Logic Programming, 7, 3, 355-375, (2007) · Zbl 1111.68072 · doi:10.1017/S1471068406002936
[72] Sudarshan, S.; Ramakrishnan, R., (1991)
[73] Swift, T.; Warren, D. S., (2010)
[74] Swift, T.; Warren, D. S., XSB: Extending prolog with tabled logic programming, Theory and Practice of Logic Programming, 12, 1, 157-187, (2012) · Zbl 1244.68021 · doi:10.1017/S1471068411000500
[75] Tachmazidis, I.; Antoniou, G.; Faber, W., Efficient computation of the well-founded semantics over big data, Theory and Practice of Logic Programming, 14, 4-5, 445-459, (2014) · Zbl 1307.68024 · doi:10.1017/S1471068414000131
[76] Tachmazidis, I.; Antoniou, G.; Flouris, G.; Kotoulas, S.; Mccluskey, L., (2012)
[77] Tsur, S., (1991)
[78] Urbani, J.; Jacobs, C. J.; Krötzsch, M., (2016)
[79] Urbani, J.; Kotoulas, S.; Maassen, J.; Van Harmelen, F.; Bal, H., Webpie: A web-scale parallel inference engine using MapReduce, Web Semantics: Science, Services and Agents on the World Wide Web, 10, 59-75, (2012) · doi:10.1016/j.websem.2011.05.004
[80] Vaghani, J.; Ramamohanarao, K.; Kemp, D. B.; Somogyi, Z.; Stuckey, P. J.; Leask, T. S.; Harland, J., The Aditi deductive database system, VLDB Journal, 3, 2, 245-288, (1994) · doi:10.1007/BF01228882
[81] Van Gelder, A., (1993)
[82] Venu, B., (2011)
[83] Wang, J.; Balazinska, M.; Halperin, D., Asynchronous and fault-tolerant recursive Datalog evaluation in shared-nothing engines, Proceedings of the VLDB Endowment, 8, 12, 1542-1553, (2015) · doi:10.14778/2824032.2824052
[84] Wolfson, O.; Ozeri, A., (1990)
[85] Wolfson, O.; Silberschatz, A., (1988)
[86] Yang, M., (2017)
[87] Yang, M.; Shkapsky, A.; Zaniolo, C., (2015)
[88] Yang, M.; Shkapsky, A.; Zaniolo, C., Scaling up the performance of more powerful datalog systems on multicore machines, VLDB Journal, 26, 2, 229-248, (2017) · doi:10.1007/s00778-016-0448-z
[89] Yang, M.; Zaniolo, C., (2014)
[90] Yu, Y.; Gunda, P. K.; Isard, M., (2009)
[91] Zaharia, M.; Chowdhury, M.; Das, T.; Dave, A.; Ma, J.; Mccauley, M.; Franklin, M. J.; Shenker, S.; Stoica, I., (2012)
[92] Zaniolo, C.; Yang, M.; Interlandi, M.; Das, A.; Shkapsky, A.; Condie, T., Fixpoint semantics and optimization of recursive datalog programs with aggregates, Theory and Practice of Logic Programming, 17, 5-6, 1048-1065, (2017) · Zbl 1422.68162 · doi:10.1017/S1471068417000436
[93] Zaniolo, C.; Yang, M.; Interlandi, M.; Das, A.; Shkapsky, A.; Condie, T., (2018)
[94] Zhang, W.; Wang, K.; Chau, S.-C., Data partition and parallel evaluation of datalog programs, IEEE Transactions on Knowledge and Data Engineering, 7, 1, 163-176, (1995) · doi:10.1109/69.368511
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.