×

A survey on the distributed computing stack. (English) Zbl 1486.68018

Summary: In this paper, we review the background and the state of the art of the Distributed Computing software stack. We aim to provide the readers with a comprehensive overview of this area by supplying a detailed big-picture of the latest technologies. First, we introduce the general background of Distributed Computing and propose a layered top-bottom classification of the latest available software. Next, we focus on each abstraction layer, i.e. Application Development (including Task-based Workflows, Dataflows, and Graph Processing), Platform (including Data Sharing and Resource Management), Communication (including Remote Invocation, Message Passing, and Message Queuing), and Infrastructure (including Batch and Interactive systems). For each layer, we give a general background, discuss its technical challenges, review the latest programming languages, programming models, frameworks, libraries, and tools, and provide a summary table comparing the features of each alternative. Finally, we conclude this survey with a discussion of open problems and future directions.

MSC:

68M14 Distributed systems
68W15 Distributed algorithms
68-02 Research exposition (monographs, survey articles) pertaining to computer science
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Asanovic, K., The Landscape of Parallel Computing Research: A View from Berkeley, Vol. 2Technical Report UCB/EECS-2006-183 (2006), EECS Department, University of California: EECS Department, University of California Berkeley
[2] Foster, I.; Kesselman, C., The Grid 2: Blueprint for a New Computing Infrastructure (2003), Elsevier
[3] Krauter, K.; Buyya, R.; Maheswaran, M., A taxonomy and survey of grid resource management systems for distributed computing, Softw. - Pract. Exp., 32, 2, 135-164 (2002) · Zbl 0987.68786
[4] Kumar, V., Introduction to Parallel Computing: Design and Analysis of Algorithms, Vol. 400 (1994), Benjamin/Cummings: Benjamin/Cummings Redwood City
[5] Asanovic, K., A view of the parallel computing landscape, Commun. ACM, 52, 10, 56-67 (2009)
[6] Kaisler, S., Big data: Issues and challenges moving forward, (46th Hawaii International Conference on System Sciences (2013), IEEE), 995-1004
[7] Sagiroglu, S.; Sinanc, D., Big data: A review, (International Conference on Collaboration Technologies and Systems (CTS) (2013), IEEE), 42-47
[8] Russom, P., Big data analytics, (TDWI Best Practices Report, Fourth Quarter 19 (2011))
[9] Dongarra, J., The international Exascale Software Project roadmap, Int. J. High Perform. Comput. Appl., 25, 1, 3-60 (2011)
[10] Reed, D. A.; Dongarra, J., Exascale computing and big data, Commun. ACM, 58, 7, 56-68 (2015)
[11] Deelman, E., Big data analytics and high performance computing convergence through workflows and virtualization, (Big Data and Extreme-Scale Computing (2016))
[12] Caíno-Lores, S.; Isaila, F.; Carretero, J., Data-aware support for hybrid HPC and big data applications, (2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (2017)), 719-722
[13] Hsu, C.; Fox, G.; Min, G.; Sharma, S., Advances in big data programming, system software and HPC convergence, J. Supercomput., 75, 489-493 (2019)
[14] Fox, G., Big data, simulations and HPC convergence, (Rabl, T.; etal., Big Data Benchmarking (2016), Springer: Springer Cham), 3-17
[15] Zaharia, M., Spark: Cluster computing with working sets, HotCloud, 10, 10-10, 95 (2010)
[16] Toshniwal, A., Storm@ twitter, (Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (2014), ACM), 147-156
[17] Abadi, M., Tensorflow: Large-scale machine learning on heterogeneous distributed systems, 1-19 (2016), arXiv preprint: 1603.04467, arXiv
[18] Liu, J., A survey of data-intensive scientific workflow management, J. Grid Comput., 13, 4, 457-493 (2015)
[19] Rimal, B. P.; Choi, E.; Lumb, I., A taxonomy and survey of cloud computing systems, (2009 Fifth International Joint Conference on INC, IMS and IDC (2009)), 44-51
[20] Kacfah Emani, C.; Cullot, N.; Nicolle, C., Understandable big data: A survey, Comp. Sci. Rev., 17, 70-81 (2015)
[21] Apache Airflow (2019), http://airflow.apache.org, accessed 2 October 2019
[22] Vecchiola, C.; Chu, X.; Buyya, R., Aneka: A software platform for .NET-based cloud computing, (High Speed and Large Scale Scientific Computing, Vol. 18 (2009)), 267-295
[23] Fahringer, T., Askalon: A grid application development and computing environment, (Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing (2005), IEEE), 122-131
[24] Manubens-Gil, D., Seamless management of ensemble climate prediction experiments on HPC platforms, (2016 International Conference on High Performance Computing Simulation (HPCS) (2016)), 895-900
[25] Celery (2019), http://www.celeryproject.org, accessed 2 October 2019
[26] D.G. Murray, et al. CIEL: a universal execution engine for distributed data-flow computing, in: Proceedings of the 8th ACM/USENIX Symposium on Networked Systems Design and Implementation, 2011, pp. 113-126.
[27] COMP superscalar (COMPSs) (2019), https://compss.bsc.es, accessed 2 October 2019
[28] Pronk, S., Copernicus: A new paradigm for parallel adaptive molecular dynamics, (Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (2011), ACM), 60:1-60:10
[29] Apache Crunch (2013), https://crunch.apache.org, accessed 2 October 2019
[30] Dask: Library for dynamic task scheduling (2019), http://dask.pydata.org, accessed 2 October 2019
[31] EcFlow (2019), https://confluence.ecmwf.int/display/ECFLOW, accessed 2 October 2019
[32] Anubhav, J., FireWorks: a dynamic workflow system designed for high-throughput applications, Concurr. Comput.: Pract. Exper., 27 (2015)
[33] Afgan, E., The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update, (Nucleic Acids Research (2016)), gkw343
[34] Dean, J.; Ghemawat, S., MapReduce: Simplified data processing on large clusters, (Proc. of the 6th Conf. on Symposium on Operating Systems Design and Implementation - Volume 6. Proc. of the 6th Conf. on Symposium on Operating Systems Design and Implementation - Volume 6, OSDI’04 (2004), USENIX Association: USENIX Association USA), 10
[35] Montesi, F., Jolie: a Java orchestration language interpreter engine, Electron. Notes Theor. Comput. Sci., 181, 19-33 (2007)
[36] Altintas, I., Kepler: an extensible system for design and execution of scientific workflows, (Proceedings. 16th International Conference on Scientific and Statistical Database Management (2004), IEEE), 423-424
[37] Netflix conductor (2019), https://netflix.github.io/conductor, accessed 2 October 2019
[38] Deelman, E., Pegasus, a workflow management system for science automation, Future Gener. Comput. Syst., 46, 17-35 (2015)
[39] Wilde, M., Swift: A language for distributed parallel scripting, Parallel Comput., 37, 9, 633-652 (2011)
[40] Hull, D., Taverna: a tool for building and running workflows of services, Nucleic Acids Res., 34, Web Server issue, W729-W732 (2006)
[41] Apache License, version 2.0 (2019), https://www.apache.org/licenses/LICENSE-2.0, accessed 2 October 2019
[42] GNU General Public License, version 2 (2017), https://www.gnu.org/licenses/old-licenses/gpl-2.0.html, accessed 2 October 2019
[43] GNU General Public License, version 3 (2016), https://www.gnu.org/licenses/gpl-3.0.en.html, accessed 2 October 2019
[44] GNU Lesser General Public License, version 2.1 (2018), https://www.gnu.org/licenses/old-licenses/lgpl-2.1.html, accessed 2 October 2019
[45] BSD License (2005), http://www.linfo.org/bsdlicense, accessed 2 October 2019
[46] MIT License (2019), https://opensource.org/licenses/MIT, accessed 2 October 2019
[47] Academic Free License version 3.0 (2005), https://opensource.org/licenses/AFL-3.0, accessed 2 October 2019
[48] Mozilla Public License version 2.0 (2019), https://www.mozilla.org/en-US/MPL/2.0, accessed 2 October 2019
[49] Eclipse Public License v1.0 (2019), https://www.eclipse.org/legal/epl-v10.html, accessed 2 October 2019
[50] Apache Apex (2019), https://apex.apache.org, accessed 2 October 2019
[51] Apache Beam (2019), https://beam.apache.org, accessed 2 October 2019
[52] Cascading (2018), http://www.cascading.org, accessed 2 October 2019
[53] Apache Gearpump (2019), https://gearpump.apache.org, accessed 2 October 2019
[54] Hazelcast Jet (2019), https://jet.hazelcast.org, accessed 2 October 2019
[55] Kulkarni, S., Twitter heron: Stream processing at scale, (Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (2015), ACM), 239-250
[56] Hirzel, M., IBM streams processing language: Analyzing big data in motion, IBM J. Res. Dev., 57, 3/4, 7-11 (2013)
[57] Schmaus, B., Netflix Blog: Stream processing with Mantis (2016), https://medium.com/netflix-techblog/stream-processing-with-mantis-78af913f51a6, accessed 2 October 2019
[58] Apache Samza (2019), http://samza.apache.org, accessed 2 October 2019
[59] Zaharia, M., Discretized streams: An efficient and fault-tolerant model for stream processing on large clusters, HotCloud, 12, 10-16 (2012)
[60] Apache Hama (2016), https://hama.apache.org, accessed 2 October 2019
[61] Buluç, A.; Gilbert, J. R., The combinatorial BLAS: Design, implementation, and applications, Int. J. High Perform. Comput. Appl., 25, 4, 496-509 (2011)
[62] Azad, A.; Buluç, A.; Gilbert, J. R., Combinatorial BLAS (CombBLAS) (2018), https://people.eecs.berkeley.edu/ aydin/CombBLAS/html/index.html, accessed 2 October 2019
[63] Amelkin, V.; Buluç, A.; Gilbert, J. R., Knowledge Discovery Toolbox (KDT) (2013), http://kdt.sourceforge.net, accessed 2 October 2019
[64] Distributed R (2019), https://marketplace.microfocus.com/vertica/content/distributed-r, accessed 2 October 2019
[65] Giraph (2019), http://giraph.apache.org, accessed 2 October 2019
[66] Simmhan, Y., GoFFish: A sub-graph centric framework for large-scale graph analytics, (Euro-Par 2014 Parallel Processing (2014), Springer), 451-462
[67] GoFFish (2017), http://dream-lab.cds.iisc.ac.in/projects/goffish, accessed 2 October 2019
[68] Xin, R. S., Graphx: A resilient distributed graph system on spark, (First International Workshop on Graph Data Management Experiences and Systems (2013), ACM), 1-6
[69] Apache Spark - GraphX (2018), https://spark.apache.org/graphx, accessed 2 October 2019
[70] Shao, B.; Wang, H.; Li, Y., Trinity: A distributed graph engine on a memory cloud, (Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (2013), ACM), 505-516
[71] Microsoft trinity project (2019), https://www.microsoft.com/en-us/research/project/trinity, accessed 2 October 2019
[72] Graph Engine (2017), https://www.graphengine.io, accessed 2 October 2019
[73] Salihoglu, S.; Widom, J., GPS: a graph processing system, (Proceedings of the 25th International Conference on Scientific and Statistical Database Management (2013), ACM), 1-12
[74] Widom, J., GPS: Graph Processing System (2014), http://infolab.stanford.edu/gps, accessed 2 October 2019
[75] Wang, P., Replication-based fault-tolerance for large-scale graph processing, (2014 44th Annual IEEE/IFIP Int. Conf. on Dependable Systems and Networks (DSN) (2014), IEEE), 562-573
[76] Imitator (2014), http://ipads.se.sjtu.edu.cn/projects/imitator.html, accessed 2 October 2019
[77] Gregor, D.; Lumsdaine, A., The parallel BGL: A generic library for distributed graph computations, (Parallel Object-Oriented Scientific Computing (POOSC), Vol. 2 (2005), Citeseer), 1-18
[78] Edmonds, N.; Gregor, D.; Lumsdaine, A., Parallel boost graph library (2009), http://www.boost.org/doc/libs/1_53_0/libs/graph_parallel/doc/html/index.html, accessed 2 October 2019 · Zbl 1209.05239
[79] Gonzalez, J. E., PowerGraph: Distributed graph-parallel computation on natural graphs, (Presented As Part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12) (2012), USENIX), 17-30
[80] Chen, R., PowerLyra: Differentiated graph computation and partitioning on skewed graphs, (Proceedings of the Tenth European Conference on Computer Systems (2015), ACM), 1:1-1:15
[81] Chen, R., PowerLyra (2013), http://ipads.se.sjtu.edu.cn/projects/powerlyra.html, accessed 2 October 2019
[82] Malewicz, G., Pregel: a system for large-scale graph processing, (Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (2010), ACM), 135-146
[83] Bu, Y., Pregelix: Big(ger) graph analytics on a dataflow engine, Proc. VLDB Endow., 8, 2, 161-172 (2014)
[84] Pregelix (2014), http://pregelix.ics.uci.edu, accessed 2 October 2019
[85] Venkataraman, S., Presto: Distributed machine learning and graph processing with sparse matrices, (Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys 2013) (2013), ACM), 197-210
[86] Xue, J., Processing concurrent graph analytics with decoupled computation model, IEEE Trans. Comput., 66, 5, 876-890 (2017) · Zbl 1368.68269
[87] Zandifar, M., The STAPL skeleton framework, (International Workshop on Languages and Compilers for Parallel Computing (2014), Springer), 176-190
[88] STAPL: Standard Template Adaptive Parallel Library (Parasol) (2017), https://parasol.tamu.edu/groups/rwergergroup/research/stapl, accessed 2 October 2019
[89] Titan Hadoop (Faunus) (2015), https://github.com/thinkaurelius/faunus, accessed 2 October 2019
[90] Low, Y., Distributed GraphLab: a framework for machine learning and data mining in the cloud, Proc. VLDB Endow., 5, 8, 716-727 (2012)
[91] Turi create (2018), https://turi.com, accessed 2 October 2019
[92] Doekemeijer, N.; Varbanescu, A. L., A Survey of Parallel Graph Processing FrameworksTechnical Report PDS-2014-003 (2014), Delft University of Technology
[93] Valiant, L. G., A bridging model for parallel computation, Commun. ACM, 33, 8, 103-111 (1990)
[94] El-Ghazawi, T., Unified parallel C (2005), http://upc.gwu.edu, accessed 2 October 2019
[95] UPC Language Specifications V1.2Tech. rep. (2005), UPC Consortium
[96] Coarfa, C., An evaluation of global address space languages: co-array fortran and unified parallel C, (Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2005), ACM), 36-47
[97] Chamberlain, B. L.; Callahan, D.; Zima, H. P., Parallel programmability and the chapel language, Int. J. High Perform. Comput. Appl., 21, 3, 291-312 (2007)
[98] The chapel parallel programming language (2019), https://chapel-lang.org, accessed 2 October 2019
[99] Fürlinger, K.; Fuchs, T.; Kowalewski, R., DASH: a C++ PGAS library for distributed data structures and parallel algorithms, (2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) (2016), Ieee), 983-990
[100] DASH (2018), http://www.dash-project.org, accessed 2 October 2019
[101] Allen, E., The fortress language specification, Sun Microsyst., 139, 140, 116 (2005)
[102] Numrich, R. W.; Reid, J., Co-array fortran for parallel programming, SIGPLAN Fortran Forum, 17, 2, 1-31 (1998)
[103] GPI-2: Programming next generation supercomputers (2019), http://www.gpi-site.com, accessed 2 October 2019
[104] Chapman, B., Introducing OpenSHMEM: SHMEM for the PGAS community, (Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model (2010), Association for Computing Machinery), 1-3
[105] OpenSHMEM (2019), http://www.openshmem.org, accessed 12 February 2020
[106] Yelick, K., Titanium: a high-performance Java dialect, Concurr. Comput.: Pract. Exper., 10, 11-13, 825-836 (1998)
[107] Hilfinger, P. N., Titanium language reference manual (2006)
[108] Titanium (2014), http://titanium.cs.berkeley.edu, accessed 2 October 2019
[109] Charles, P., X10: An object-oriented approach to non-uniform cluster computing, SIGPLAN Not., 40, 10, 519-538 (2005)
[110] Saraswat, V., X10 language specification - Version 2.6.2 (2019)
[111] The X10 parallel programming language (2018), http://x10-lang.org, accessed 2 October 2019
[112] PGAS: Partitioned Global Address Space (2016), http://www.pgas.org, accessed 2 October 2019
[113] Tardieu, O., The APGAS library: Resilient parallel and distributed programming in Java 8, (Proceedings of the ACM SIGPLAN Workshop on X10. Proceedings of the ACM SIGPLAN Workshop on X10, X10 2015 (2015), ACM), 25-26
[114] Breitbart, J.; Schmidtobreick, M.; Heuveline, V., Evaluation of the global address space programming interface (GASPI), (2014 IEEE International Parallel Distributed Processing Symposium Workshops (2014)), 717-726
[115] GASPI: Global address space programming interface (2019), http://www.gaspi.de, accessed 2 October 2019
[116] Alrutz, T., GASPI - A partitioned global address space programming interface, (Facing the Multicore-Challenge III: Aspects of New Paradigms and Technologies in Parallel Computing (2013), Springer), 135-136
[117] What is RDMA? (2019), https://community.mellanox.com/s/article/what-is-rdma-x, accessed 12 February 2020
[118] Bonachea, D.; Hargrove, P. H., GASNet-EX: A high-performance, portable communication library for exascale, (International Workshop on Languages and Compilers for Parallel Computing (2018), Springer), 138-158
[119] GASNet (2020), https://gasnet.lbl.gov, accessed 12 February 2020
[120] Heichler, J., An Introduction to BeeGFSTech. rep. (2014), BeeGFS, URL https://www.beegfs.io/docs/whitepapers/Introduction_to_BeeGFS_by_ThinkParQ.pdf
[121] BeeGFS (2019), https://www.beegfs.io, accessed 2 October 2019
[122] Weil, S. A., Ceph: A scalable, high-performance distributed file system, (Proceedings of the 7th Symposium on Operating Systems Design and Implementation. Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI’06 (2006), USENIX Association: USENIX Association USA), 307-320
[123] DataPlow nasan file system (2019), http://www.dataplow.com/Products.htm#Nasan, accessed 2 October 2019
[124] Vef, M.-A., GekkoFS - A temporary distributed file system for HPC applications, (2018 IEEE International Conference on Cluster Computing (CLUSTER) (2018), IEEE), 319-324
[125] GekkoFS (2020), https://github.com/NGIOproject/GekkoFS, accessed 22 May 2020
[126] Boyer, E. B.; Broomfield, M. C.; Perrotti, T. A., Glusterfs One Storage Server to Rule them AllTech. rep. (2012), Los Alamos National Lab.(LANL), Los Alamos, NM (United States), URL https://www.osti.gov/biblio/1048672
[127] Davies, A.; Orsaria, A., Scale out with GlusterFS, Linux J., 2013, 235 (2013)
[128] Ghemawat, S.; Gobioff, H.; Leung, S., The google file system, SIGOPS Oper. Syst. Rev., 37, 5, 29-43 (2003)
[129] Shvachko, K., The hadoop distributed file system, (2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (2010), IEEE), 1-10
[130] Schmuck, F.; Haskin, R., GPFS: A shared-disk file system for large computing clusters, (Proceedings of the 1st USENIX Conference on File and Storage Technologies. Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST’02 (2002), USENIX Association: USENIX Association USA), 16
[131] Infinit storage platform (2015), https://infinit.sh/reference, accessed 2 October 2019
[132] LizardFS (2019), https://lizardfs.com, accessed 2 October 2019
[133] Faibish, S., Lustre file system (2017), US Patent 9,779,108
[134] D’amato, A., Cluster shared volumes (2010), US Patent 7,840,730
[135] MooseFS (2019), https://moosefs.com, accessed 2 October 2019
[136] Nagle, D.; Serenyi, D.; Matthews, A., The Panasas ActiveScale storage cluster: Delivering scalable high bandwidth storage, (Proceedings of the 2004 ACM/IEEE Conference on Supercomputing (2004), IEEE), 53
[137] Panasas ActiveStor Architecture OverviewTech. rep. (2017), White paper, URL http://performance.panasas.com/wp-architecture-hp-thanks.html
[138] Carns, P. H., PVFS: A parallel file system for Linux clusters, (Proceedings of the 4th Annual Linux Showcase and Conference - Volume 4. Proceedings of the 4th Annual Linux Showcase and Conference - Volume 4, ALS’00 (2000), USENIX Association), 28-29
[139] The OrangeFS Project (2018), http://www.orangefs.org, accessed 2 October 2019
[140] Whitehouse, S., The GFS2 filesystem, (Proceedings of the Linux Symposium (2007), Citeseer), 253-259
[141] Red Hat Global File SystemTech. rep. (2004), RedHat, URL https://listman.redhat.com/whitepapers/rha/gfs/GFS_INS0032US.pdf
[142] Shepard, L.; Eppe, E., SGI® InfiniteStorage Shared Filesystem CXFS: A High-Performance, Multi-OS Filesystem from SGITech. rep. (2003), Whitepaper 2691, URL https://jarrang.com/client/sgi/storage/StorageCD/Collateral/DataSheets/GeneralAndWhitepapers/SGIInfiniteStorageSharedFilesystemCXFSwhitepaper.pdf
[143] Stender, J.; Berlin, M.; Reinefeld, A., XtreemFS: A file system for the cloud, (Data Intensive Storage Services for Cloud Environments (2013), IGI Global), 267-285
[144] DynamoDB (2019), accessed 4 December 2019, https://aws.amazon.com/es/dynamodb
[145] Lakshman, A.; Malik, P., Cassandra: a decentralized structured storage system, Oper. Syst. Rev., 44, 2, 35-40 (2010)
[146] Anderson, J. C.; Lehnardt, J.; Slater, N., CouchDB: The Definitive Guide (2010), O’Reilly Media Inc.
[147] Apache CouchDB (2019), accessed 4 December 2019, https://couchdb.apache.org
[148] Martí, J., Dataclay: A distributed data store for effective inter-player data sharing, J. Syst. Softw., 131, 129-145 (2017)
[149] DataClay (2019), accessed 4 December 2019, https://www.bsc.es/research-and-development/software-and-apps/software-list/dataclay
[150] Hazelcast IMDG (2019), https://hazelcast.com/products/imdg, accessed 4 December 2019
[151] Vora, M. N., Hadoop-hbase for large-scale data, (Proceedings of 2011 International Conference on Computer Science and Network Technology, Vol. 1 (2011), IEEE), 601-605
[152] Apache HBase (2019), https://hbase.apache.org, accessed 4 December 2019
[153] Alomar, G.; Becerra, Y.; Torres, J., Hecuba: Nosql made easy, (BSC Doctoral Symposium (2nd: 2015: Barcelona) (2015), Barcelona Supercomputing Center), 136-137
[154] Tejedor, E., Pycompss: Parallel computational workflows in python, Int. J. High Performance Comput. Appl. (IJHPCA), 31, 1, 66-82 (2017)
[155] Hecuba (2019), https://github.com/bsc-dd/hecuba, accessed 4 December 2019
[156] Intersystems cache (2019), https://www.intersystems.com/products/cache, accessed 4 December 2019
[157] JanusGraph (2019), https://janusgraph.org, accessed 4 December 2019
[158] Thinkaurelius, E., Titan: Distributed graph database (2015), http://titan.thinkaurelius.com, accessed 11 May 2020
[159] Memcached (2018), https://memcached.org, accessed 4 December 2019
[160] Banker, K., MongoDB in Action (2011), Manning Publications Co.
[161] MongoDB: The most popular database for modern apps (2019), accessed 4 December 2019, https://www.mongodb.com
[162] Suehring, S., MySQL Bible (2002), John Wiley & Sons Inc.
[163] MySQL (2019), https://www.mysql.com, accessed 4 December 2019
[164] Tesoriero, C., Getting Started with OrientDB (2013), Packt Publishing Ltd
[165] OrientDB: The database designed for the modern world (2019), https://orientdb.com, accessed 4 December 2019
[166] Ousterhout, J., The case for RAMClouds: scalable high-performance storage entirely in DRAM, Oper. Syst. Rev., 43, 4, 92-105 (2010)
[167] Ousterhout, J., RAMCloud project (2019), https://ramcloud.atlassian.net/wiki/spaces/RAM/overview, accessed 22 May 2020
[168] Macedo, T.; Oliveira, F., Redis Cookbook: Practical Techniques for Fast Data Manipulation (2011), O’Reilly Media Inc.
[169] Redis (2019), https://redis.io, accessed 4 December 2019
[170] Riak (2019), https://riak.com/riak, accessed 4 December 2019
[171] Virtuoso: Data-driven agility without compromise (2019), https://virtuoso.openlinksw.com, accessed 4 December 2019
[172] Consul (2019), https://www.consul.io, accessed 4 December 2019
[173] Rarick, K., Introducing Doozerd (2011), https://xph.us/2011/04/13/introducing-doozer.html, accessed 4 December 2019
[174] Etcd: A distributed, reliable key-value store for the most critical data of a distributed system (2019), https://etcd.io, accessed 4 December 2019
[175] Netflix Eureka - GitHub repository (2019), https://github.com/Netflix/eureka, accessed 4 December 2019
[176] Burrows, M., The chubby lock service for loosely-coupled distributed systems, (Proceedings of the 7th Symposium on Operating Systems Design and Implementation (2006), USENIX Association), 335-350
[177] Chubby: A lock service for distributed coordination (2018), In https://medium.com/coinmonks/chubby-a-centralized-lock-service-for-distributed-applications-390571273052, accessed 4 December 2019
[178] Serf (2019), https://www.serf.io, accessed 4 December 2019
[179] Hunt, P., ZooKeeper: Wait-free coordination for internet-scale systems, (USENIX Annual Technical Conference, Vol. 8 (2010), Boston, MA, USA), 1-14
[180] Apache ZooKeeper (2019), https://zookeeper.apache.org, accessed 4 December 2019
[181] Serf vs. ZooKeeper, doozerd, etcd (2019), https://www.serf.io/intro/vs-zookeeper.html, accessed 4 December 2019
[182] Glushkov, I., Comparing ZooKeeper and consul (2014), https://es.slideshare.net/IvanGlushkov/zookeeper-vs-consul-41882991, accessed 4 December 2019
[183] Farcic, V., Service discovery: Zookeeper vs etcd vs consul (2015), https://technologyconversations.com/2015/09/08/service-discovery-zookeeper-vs-etcd-vs-consul, accessed 4 December 2019
[184] Lamport, L., Paxos made simple, ACM Sigact News, 32, 4, 18-25 (2001)
[185] Ongaro, D.; Ousterhout, J., In search of an understandable consensus algorithm, (2014 USENIX Annual Technical Conference (USENIX ATC 14) (2014)), 305-319
[186] Birman, K., The promise, and limitations, of gossip protocols, Oper. Syst. Rev., 41, 5, 8-13 (2007)
[187] Datadog: Cloud monitoring as a service (2019), https://www.datadoghq.com, accessed 4 December 2019
[188] Willnecker, F.; Brunnert, A.; Gottesheim, W.; Krcmar, H., Using dynatrace monitoring data for generating performance models of java EE applications, (Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (2015), Association for Computing Machinery), 103-104
[189] Dynatrace: The leader in cloud monitoring (2021), https://www.dynatrace.com, accessed 21 May 2021
[190] Elasticsearch, B. V., ELK stack: Elasticsearch, Logstash, Kibana (2019), https://www.elastic.co/what-is/elk-stack, accessed 4 December 2019
[191] Graylog (2019), https://www.graylog.org, accessed 4 December 2019
[192] Villella, P.; Petersen, C., Log collection, structuring and processing (2010), US Patent 7,653,633
[193] LogRhythm: The security intelligence company (2019), https://logrhythm.com, accessed 4 December 2019
[194] Barth, W., Nagios: System and Network Monitoring (2008), No Starch Press
[195] Nagios - the industry standard in IT infrastructure monitoring (2019), https://www.nagios.org, accessed 4 December 2019
[196] New Relic (2021), https://newrelic.com, accessed 21 May 2021
[197] Log management by loggly (2019), https://www.loggly.com, accessed 4 December 2019
[198] Carasso, D., Exploring Splunk (2012), CITO Research New York, USA
[199] Splunk: SIEM, AIOps, application management, log management, machine learning, and compliance (2019), https://www.splunk.com, accessed 4 December 2019
[200] Massie, M. L.; Chun, B. N.; Culler, D. E., The ganglia distributed monitoring system: design, implementation, and experience, Parallel Comput., 30, 7, 817-840 (2004)
[201] Ganglia monitoring system (2018), http://ganglia.info, accessed 2 October 2019
[202] Icinga (2019), https://icinga.com, accessed 4 December 2019
[203] Pandora FMS: The flexible monitoring software for large business (2019), https://pandorafms.com, accessed 4 December 2019
[204] Sensu (2019), https://sensu.io, accessed 4 December 2019
[205] Olups, R., Zabbix Network Monitoring (2016), Packt Publishing Ltd
[206] Zabbix: The enterprise-class open source network monitoring solution (2019), https://www.zabbix.com, accessed 4 December 2019
[207] Badger, M., Zenoss Core Network and System Monitoring (2008), Packt Publishing Ltd
[208] Zenoss: Intelligent application and service monitoring + aiops (2019), https://www.zenoss.com, accessed 4 December 2019
[209] Forster, F., Collectd (2019), https://collectd.org, accessed 4 December 2019
[210] Fluentd: Open source data collector and unified logging layer (2019), https://www.fluentd.org, accessed 4 December 2019
[211] Hoffman, S., Apache Flume: Distributed Log Collection for Hadoop (2013), Packt Publishing Ltd
[212] Apache flume (2019), https://flume.apache.org, accessed 4 December 2019
[213] Prometheus: From metrics to insight (2019), https://prometheus.io, accessed 4 December 2019
[214] Scribe: Transporting petabytes per hour via a distributed, buffered queueing system (2019), https://engineering.fb.com/data-infrastructure/scribe, accessed 4 December 2019
[215] Open source monitoring tools (2016), http://opentica.com/en/2016/02/02/open-source-monitoring-tools, accessed 4 December 2019
[216] Kufel, L., Tools for distributed systems monitoring, Found. Comput. Decis. Sci., 41, 4, 237-260 (2016)
[217] Bhargava, R., Best of 2018: Log monitoring and analysis: Comparing ELK, Splunk and Graylog (2018), https://devops.com/log-monitoring-and-analysis-comparing-elk-splunk-and-graylog, accessed 4 December 2019
[218] Keary, T., Nagios vs zabbix compared – which is better for network monitoring? (2018), https://www.comparitech.com/net-admin/nagios-vs-zabbix, accessed 4 December 2019
[219] Peri, N., Fluentd vs. Logstash: A comparison of log collectors (2015), https://logz.io/blog/fluentd-logstash, accessed 4 December 2019
[220] Gartner magic quadrant for application performance monitoring (2020), https://www.gartner.com/en/documents/3983892/magic-quadrant-for-application-performance-monitoring, accessed 21 May 2021
[221] Elasticsearch, B. V., Elasticsearch: The official distributed search and analytics (2019), https://www.elastic.co/products/elasticsearch, accessed 4 December 2019
[222] Turnbull, J., The Logstash Book (2013), James Turnbull
[223] Elasticsearch, B. V., Logstash: Collect, parse, transform logs (2019), https://www.elastic.co/products/logstash, accessed 4 December 2019
[224] Elasticsearch, B. V., Kibana: Explore, visualize, discover data (2019), https://www.elastic.co/products/kibana, accessed 4 December 2019
[225] Protocol buffers (2019), https://developers.google.com/protocol-buffers, accessed 3 December 2019
[226] Snyder, B.; Bosnanac, D.; Davies, R., ActiveMQ in Action, Vol. 47 (2011), Manning Greenwich Conn.
[227] Apache activeMQ (2019), https://activemq.apache.org, accessed 3 December 2019
[228] Gupta, M., Akka Essentials (2012), Packt Publishing Ltd
[229] AKKA documentation - classic actors (2019), https://doc.akka.io/docs/akka/current/actors.html, accessed 3 December 2019
[230] AKKA documentation - streams (2019), https://doc.akka.io/docs/akka/current/stream/index.html, accessed 3 December 2019
[231] Apache Qpid (2015), https://qpid.apache.org, accessed 3 December 2019
[232] Cap’n proto: Introduction (2013), https://capnproto.org, accessed 3 December 2019
[233] Carbone, P., Apache flink: Stream and batch processing in a single engine, Bull. IEEE Comput. Soc. Techn. Committ. Data Eng., 36, 4 (2015)
[234] GRPC motivation and design principles (2015), https://grpc.io/blog/principles, accessed 3 December 2019
[235] Java remote method invocation specification (2017), https://docs.oracle.com/javase/9/docs/specs/rmi, accessed 3 December 2019
[236] Ban, B., JGroups, a toolkit for reliable multicast communication (2002)
[237] J. Kreps, N. Narkhede, J. Rao, et al. Kafka: A distributed messaging system for log processing, in: Proceedings of the NetDB, 2011, pp. 1-7.
[238] Apache Kafka (2017), https://kafka.apache.org, accessed 3 December 2019
[239] Gabriel, E., Open MPI: Goals, concept, and design of a next generation MPI implementation, (European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting (2004), Springer), 97-104
[240] Videla, A.; Williams, J. J., RabbitMQ in Action: Distributed Messaging for Everyone (2012), Manning
[241] The spread toolkit (2006)
[242] Prunicki, A., Apache ThriftTech. rep. (2009), Object Computing, Inc.
[243] Hintjens, P., ZeroMQ: Messaging for Many Applications (2013), O’Reilly Media Inc.
[244] Tanenbaum, A. S.; Van Steen, M., Distributed Systems: Principles and Paradigms (2007), Prentice-Hall · Zbl 1157.68323
[245] Enduro/x middleware platform for distributed transaction processing (2015), https://www.endurox.org, accessed 3 December 2019
[246] Gentzsch, W., Sun grid engine: Towards creating a compute power grid, (Cluster Computing and the Grid, 2001. Proceedings. First IEEE/ACM International Symposium on (2001), IEEE), 35-36
[247] Vavilapalli, V. K., Apache hadoop yarn: Yet another resource negotiator, (Proceedings of the 4th Annual Symposium on Cloud Computing (2013), ACM), 5
[248] Apache Hadoop YARN (2019), https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html, accessed 3 December 2019
[249] Thain, D.; Tannenbaum, T.; Livny, M., Distributed computing in practice: the Condor experience, Concurr. Comput.: Pract. Exper., 17, 2-4, 323-356 (2005)
[250] HTCondor - high troughput computing (2019), https://research.cs.wisc.edu/htcondor, accessed 3 December 2019
[251] IBM LSF (2016), https://www.ibm.com/support/knowledgecenter/en/SSETD4/product_welcome_platform_lsf.html, accessed 3 December 2019
[252] Joshi, P.; Babu, M. R., Openlava: An open source scheduler for high performance computing, (2016 International Conference on Research Advances in Integrated Navigation Systems (RAINS) (2016)), 1-3
[253] Henderson, R. L., Job scheduling under the portable batch system, (Workshop on Job Scheduling Strategies for Parallel Processing (1995), Springer), 279-294
[254] PBS professional - open source project (2019), https://www.pbspro.org, accessed 3 December 2019
[255] Yoo, A. B.; Jette, M. A.; Grondona, M., Slurm: Simple linux utility for resource management, (Workshop on Job Scheduling Strategies for Parallel Processing (2003), Springer), 44-60
[256] Slurm workload manager (2019), https://slurm.schedmd.com, accessed 3 December 2019
[257] TORQUE resource manager (2019), https://www.adaptivecomputing.com/products/torque, accessed 3 December 2019
[258] Kumar, R., Apache cloudstack: Open source infrastructure as a service cloud computing platform, Proc. Int. J. Adv. Eng. Technol. Manage. Appl. Sci., 111, 116 (2014)
[259] Apache CloudStack - Open source cloud computing (2017), https://cloudstack.apache.org, accessed 3 December 2019
[260] Naik, N., Building a virtual system of systems using docker swarm in multiple clouds, (2016 IEEE International Symposium on Systems Engineering (ISSE) (2016)), 1-3
[261] Swarm mode overview (2019), https://docs.docker.com/engine/swarm, accessed 3 December 2019
[262] Nurmi, D., The eucalyptus open-source cloud-computing system, (Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid (2009), IEEE Computer Society), 124-131
[263] Eucalyptus (2018), https://www.eucalyptus.cloud, accessed 3 December 2019
[264] Hightower, K.; Burns, B.; Beda, J., Kubernetes: Up and Running: Dive Into the Future of Infrastructure (2017), O’Reilly Media Inc.
[265] Kubernetes (2019), https://kubernetes.io, accessed 3 December 2019
[266] Hindman, B., Mesos: A platform for fine-grained resource sharing in the data center, (NSDI, Vol. 11 (2011)), 22
[267] Apache Mesos (2018), http://mesos.apache.org, accessed 3 December 2019
[268] Toraldo, G., Opennebula 3 Cloud Computing (2012), Packt Publishing Ltd
[269] OpenNebula (2019), https://opennebula.org, accessed 3 December 2019
[270] Sefraoui, O.; Aissaoui, M.; Eleuldj, M., OpenStack: toward an open-source solution for cloud computing, Int. J. Comput. Appl., 55, 3, 38-42 (2012)
[271] OpenStack (2019), https://www.openstack.org, accessed 3 December 2019
[272] RedHat OpenShift (2019), https://www.openshift.com, accessed 3 December 2019
[273] Wen, X., Comparison of open-source cloud management platforms: OpenStack and OpenNebula, (2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (2012)), 2457-2461
[274] Milojičić, D.; Llorente, I. M.; Montero, R. S., Opennebula: A cloud management tool, IEEE Internet Comput., 15, 2, 11-14 (2011)
[275] Kubernetes and docker swarm compared (2017), https://platform9.com/blog/kubernetes-docker-swarm-compared, accessed 3 December 2019
[276] Kubernetes and mesos compared (2016), https://platform9.com/blog/compare-kubernetes-vs-mesos, accessed 3 December 2019
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.