MapReduce swMATH ID: 546 Software Authors: Pan, Jie; Magoulès, Frédéric; Le Biannic, Yann Description: MapReduce is a new parallel programming model initially developed for large-scale web content processing. Data analysis meets the issue of how to do calculation over extremely large datasets. The arrival of MapReduce provides a chance to utilize commodity hardware for massively parallel data analysis applications. The translation and optimization from relational algebra operators to MapReduce programs is still an open and dynamic research field. In this paper, we focus on a special type of data analysis query, namely multiple group by query. We first study the communication cost of the MapReduce model, then we give an initial implementation of multiple group by query. We then propose an optimized version which addresses and improves the communication cost issues. Our optimized version shows a better accelerating ability and a better scalability than the other version Homepage: http://en.wikipedia.org/wiki/MapReduce Related Software: Hadoop; Spark; Dryad; Apache Spark; UCI-ml; Haskell; Hive; Pregel; GitHub; Bigtable; CUDA; Pegasus; OpenCL; Sun Grid Engine; R; GraphLab; Amazon EC2; DryadLINQ; Chapel; MongoDB Cited in: 198 Publications Standard Articles 1 Publication describing the Software, including 1 Publication in zbMATH Year Implementing and optimizing multiple group by query in a MapReduce approach. Zbl 1206.68082Pan, Jie; Magoulès, Frédéric; Le Biannic, Yann 2010 all top 5 Cited by 631 Authors 3 Chen, Jinjun 3 Jiang, Yiwei 3 Ketsman, Bas 3 Koutris, Paraschos 3 Xu, Yinfeng 3 Zhou, Ping 2 Afrati, Foto N. 2 Basin, David A. 2 Bateni, MohammadHossein 2 Bellodi, Elena 2 Berlińska, Joanna 2 Czumaj, Artur 2 del Río, Sara 2 Drozdowski, Maciej 2 Gleich, David F. 2 Gudes, Ehud 2 Hajiaghayi, Mohammad Taghi 2 Harchol-Balter, Mor 2 Herrera, Francisco 2 Kasahara, Shoji 2 Kersting, Kristian 2 Klaedtke, Felix 2 Kohn, Robert J. 2 Krause, Andreas 2 Li, Yantao 2 Liu, Chang 2 Masuyama, Hiroyuki 2 Mirrokni, Vahab S. 2 Montealegre, Pedro 2 Nepal, Surya 2 Nielsen, Thomas D. 2 Quiroz, Matias 2 Rapaport, Ivan 2 Riguzzi, Fabrizio 2 Sharma, Shantanu 2 Suciu, Dan Mircea 2 Takahashi, Yutaka 2 Tao, Jie 2 Todinca, Ioan 2 Tran, Minh Ngoc 2 Ullman, Jeffrey David 2 Villani, Mattias 2 Wang, Lizhe 2 Wang, Yuping 2 Wu, Weili 2 Xia, Dawen 2 Xin, Junchang 2 Zhang, Xuyun 2 Zhang, Zili 2 Zhou, Wei 2 Zhu, Yuqing 1 Achten, Peter 1 Afzal, Asif 1 Agapito, Giuseppe 1 Agrawal, Nikunj 1 Ahmadi, Babak 1 Ahmadi, Saba 1 Ahmed, Chowdhury Farhan 1 Ahmed, Reaz 1 Albarghouthi, Aws 1 Alham, Nasullah Khalid 1 Altimiras, Francisco 1 Alvarez, Javier 1 Alvarez, Pol 1 Amde, Manish 1 Annoni, Jennifer 1 Ansari, Zahid A. 1 Apishev, M. A. 1 Arias, Jacinto 1 Atar, Rami 1 Audrito, Giorgio 1 Averbuch, Amir Z. 1 Aydin, Kevin 1 Ayed, Rahma Ben 1 Babaee, Hessam 1 Badia, Rosa Maria 1 Bai, Mei 1 Balasubramanian, Bharath 1 Balmin, Andrey 1 Ban, Tao 1 Banyal, Rohitash Kumar 1 Bao, Liang 1 Bauckhage, Christian 1 Bawakid, Abdullah 1 Beal, Jacob 1 Becker, Florent 1 Behnezhad, Soheil 1 Bellet, Aurélien 1 Bengtson, Jesper 1 Bennet, Colin 1 Bermanis, Amit 1 Berthold, Michael R. 1 Beyan, Oya Deniz 1 Bilal, Kashif 1 Biletskyy, Borys 1 Blass, Erik-Oliver 1 Borodin, Allan B. 1 Bottou, Léon 1 Boutaba, Raouf 1 Bowers, Shawn ...and 531 more Authors all top 5 Cited in 96 Serials 10 Journal of Computer and System Sciences 8 Machine Learning 7 Information Sciences 6 Theoretical Computer Science 6 Algorithms 5 Computing 5 IEEE Transactions on Computers 5 Mathematical Problems in Engineering 5 Theory of Computing Systems 5 Journal of Combinatorial Optimization 5 Journal of Machine Learning Research (JMLR) 4 SIAM Journal on Scientific Computing 4 Journal of Functional Programming 3 International Journal of Approximate Reasoning 3 Queueing Systems 3 European Journal of Operational Research 3 Complexity 3 Logical Methods in Computer Science 2 ACM Transactions on Database Systems 2 Fuzzy Sets and Systems 2 Operations Research 2 Programming and Computer Software 2 SIAM Journal on Computing 2 Information and Computation 2 Computers & Operations Research 2 Formal Aspects of Computing 2 Distributed Computing 2 Annals of Mathematics and Artificial Intelligence 2 Data Mining and Knowledge Discovery 2 Higher-Order and Symbolic Computation 2 Sādhanā 2 Journal of Industrial and Management Optimization 2 Optimization Letters 2 Statistical Analysis and Data Mining 2 Journal of Computational and Graphical Statistics 2 Computer Science Review 1 Advances in Applied Probability 1 Artificial Intelligence 1 Computers & Mathematics with Applications 1 Journal of the Franklin Institute 1 Physics Reports 1 The Annals of Statistics 1 Applied Mathematics and Computation 1 Journal of the American Statistical Association 1 Journal of Multivariate Analysis 1 Mathematics of Operations Research 1 Advances in Applied Mathematics 1 Science of Computer Programming 1 Parallel Computing 1 Constructive Approximation 1 Statistical Science 1 New Generation Computing 1 Algorithmica 1 Computational Mechanics 1 Asia-Pacific Journal of Operational Research 1 Journal of Parallel and Distributed Computing 1 Neural Networks 1 International Journal of Foundations of Computer Science 1 Journal of Global Optimization 1 Computational Statistics 1 International Journal of Robust and Nonlinear Control 1 Foundations of Computing and Decision Sciences 1 Journal of Nonlinear Science 1 Cybernetics and Systems Analysis 1 Formal Methods in System Design 1 Applied and Computational Harmonic Analysis 1 Statistica Sinica 1 Journal of Scheduling 1 Journal of the ACM 1 Discrete Dynamics in Nature and Society 1 Journal of Applied Statistics 1 Journal of Discrete Mathematical Sciences & Cryptography 1 RAIRO. Operations Research 1 Archives of Computational Methods in Engineering 1 Theory and Practice of Logic Programming 1 4OR 1 ACM Journal of Experimental Algorithmics 1 Internet Mathematics 1 International Journal of Wavelets, Multiresolution and Information Processing 1 Parallel Processing Letters 1 Advances in Data Analysis and Classification. ADAC 1 Electronic Journal of Statistics 1 The Annals of Applied Statistics 1 Foundations and Trends in Databases 1 Vestnik Yuzhno-Ural’skogo Gosudarstvennogo Universiteta. Seriya Matematicheskoe Modelirovanie i Programmirovanie 1 Journal of Algorithms & Computational Technology 1 Sankhyā. Series A 1 ACM Transactions on Algorithms 1 Statistics and Computing 1 Information and Inference 1 Frontiers of Computer Science 1 Bayesian Analysis 1 ISRN Biomathematics 1 Journal of Optimization 1 Journal of Logical and Algebraic Methods in Programming 1 Journal of Membrane Computing all top 5 Cited in 19 Fields 153 Computer science (68-XX) 42 Statistics (62-XX) 41 Operations research, mathematical programming (90-XX) 20 Combinatorics (05-XX) 14 Numerical analysis (65-XX) 10 Probability theory and stochastic processes (60-XX) 10 Game theory, economics, finance, and other social and behavioral sciences (91-XX) 7 Mathematical logic and foundations (03-XX) 4 Information and communication theory, circuits (94-XX) 3 Systems theory; control (93-XX) 2 Biology and other natural sciences (92-XX) 1 History and biography (01-XX) 1 General algebraic systems (08-XX) 1 Linear and multilinear algebra; matrix theory (15-XX) 1 Partial differential equations (35-XX) 1 Dynamical systems and ergodic theory (37-XX) 1 Approximations and expansions (41-XX) 1 Operator theory (47-XX) 1 Fluid mechanics (76-XX) Citations by Year