MapReduce swMATH ID: 546 Software Authors: Pan, Jie; Magoulès, Frédéric; Le Biannic, Yann Description: MapReduce is a new parallel programming model initially developed for large-scale web content processing. Data analysis meets the issue of how to do calculation over extremely large datasets. The arrival of MapReduce provides a chance to utilize commodity hardware for massively parallel data analysis applications. The translation and optimization from relational algebra operators to MapReduce programs is still an open and dynamic research field. In this paper, we focus on a special type of data analysis query, namely multiple group by query. We first study the communication cost of the MapReduce model, then we give an initial implementation of multiple group by query. We then propose an optimized version which addresses and improves the communication cost issues. Our optimized version shows a better accelerating ability and a better scalability than the other version Homepage: http://en.wikipedia.org/wiki/MapReduce Related Software: Hadoop; Spark; Apache Spark; Dryad; UCI-ml; GitHub; Haskell; Hive; Pregel; CUDA; Bigtable; R; Pegasus; OpenCL; Twister; Sun Grid Engine; GraphLab; Amazon EC2; DryadLINQ; LIBSVM Cited in: 214 Documents Standard Articles 1 Publication describing the Software, including 1 Publication in zbMATH Year Implementing and optimizing multiple group by query in a MapReduce approach. Zbl 1206.68082Pan, Jie; Magoulès, Frédéric; Le Biannic, Yann 2010 all top 5 Cited by 677 Authors 3 Chen, Jinjun 3 Czumaj, Artur 3 Jiang, Yiwei 3 Ketsman, Bas 3 Koutris, Paraschos 3 Xu, Yinfeng 3 Zhou, Ping 2 Afrati, Foto N. 2 Audrito, Giorgio 2 Basin, David A. 2 Bateni, MohammadHossein 2 Bellodi, Elena 2 Berlińska, Joanna 2 Damiani, Ferruccio 2 del Río, Sara 2 Drozdowski, Maciej 2 Du, Ding-Zhu 2 Gleich, David F. 2 Gudes, Ehud 2 Hajiaghayi, Mohammad Taghi 2 Harchol-Balter, Mor 2 Herrera, Francisco 2 Kasahara, Shoji 2 Kersting, Kristian 2 Klaedtke, Felix 2 Kohn, Robert J. 2 Konecny, Jan 2 Krajca, Petr 2 Krause, Andreas 2 Li, Yantao 2 Liu, Chang 2 Masuyama, Hiroyuki 2 Mirrokni, Vahab S. 2 Montealegre, Pedro 2 Nepal, Surya 2 Nielsen, Thomas D. 2 Quiroz, Matias 2 Rapaport, Ivan 2 Riguzzi, Fabrizio 2 Sharma, Shantanu 2 Suciu, Dan Mircea 2 Takahashi, Yutaka 2 Tao, Jie 2 Todinca, Ioan 2 Tran, Minh Ngoc 2 Ullman, Jeffrey David 2 Villani, Mattias 2 Viroli, Mirko 2 Wang, Lizhe 2 Wang, Yuping 2 Wu, Weili 2 Xia, Dawen 2 Xin, Junchang 2 Zhang, Xuyun 2 Zhang, Zili 2 Zhou, Wei 2 Zhu, Yuqing 1 Achten, Peter 1 Afzal, Asif 1 Agapito, Giuseppe 1 Agrawal, Nikunj 1 Ahmadi, Babak 1 Ahmadi, Saba 1 Ahmed, Chowdhury Farhan 1 Ahmed, Reaz 1 Albarghouthi, Aws 1 Alham, Nasullah Khalid 1 Alonso-Betanzos, Amparo 1 Altimiras, Francisco 1 Alvarez, Javier 1 Alvarez, Pol 1 Amde, Manish 1 Annoni, Jennifer 1 Ansari, Zahid A. 1 Antonazzo, Filippo 1 Apishev, M. A. 1 Arias, Jacinto 1 Atar, Rami 1 Averbuch, Amir Z. 1 Aydin, Kevin 1 Ayed, Rahma Ben 1 Babaee, Hessam 1 Badia, Rosa Maria 1 Bai, Mei 1 Balasubramanian, Bharath 1 Balmin, Andrey 1 Ban, Tao 1 Banyal, Rohitash Kumar 1 Bao, Liang 1 Bauckhage, Christian 1 Bawakid, Abdullah 1 Beal, Jacob 1 Becker, Florent 1 Behnezhad, Soheil 1 Bellet, Aurélien 1 Bengtson, Jesper 1 Bennet, Colin 1 Bermanis, Amit 1 Berthold, Michael R. 1 Beyan, Oya Deniz ...and 577 more Authors all top 5 Cited in 104 Serials 10 Journal of Computer and System Sciences 8 Information Sciences 8 Machine Learning 6 Theoretical Computer Science 6 Algorithms 5 Computing 5 IEEE Transactions on Computers 5 Mathematical Problems in Engineering 5 Theory of Computing Systems 5 Journal of Combinatorial Optimization 5 Journal of Machine Learning Research (JMLR) 4 Queueing Systems 4 SIAM Journal on Scientific Computing 4 Journal of Functional Programming 4 Logical Methods in Computer Science 3 Operations Research 3 SIAM Journal on Computing 3 International Journal of Approximate Reasoning 3 European Journal of Operational Research 3 Complexity 2 ACM Transactions on Database Systems 2 Fuzzy Sets and Systems 2 Programming and Computer Software 2 Information and Computation 2 Computers & Operations Research 2 Formal Aspects of Computing 2 Distributed Computing 2 Annals of Mathematics and Artificial Intelligence 2 Data Mining and Knowledge Discovery 2 Higher-Order and Symbolic Computation 2 Sādhanā 2 Journal of Industrial and Management Optimization 2 Optimization Letters 2 Statistical Analysis and Data Mining 2 Foundations and Trends in Databases 2 Journal of Computational and Graphical Statistics 2 Statistics and Computing 2 Computer Science Review 1 Advances in Applied Probability 1 The American Statistician 1 Artificial Intelligence 1 Computers & Mathematics with Applications 1 Computer Physics Communications 1 Journal of the Franklin Institute 1 Physics Reports 1 The Annals of Statistics 1 Applied Mathematics and Computation 1 International Statistical Review 1 Journal of the American Statistical Association 1 Journal of Multivariate Analysis 1 Journal of Optimization Theory and Applications 1 Mathematics of Operations Research 1 Advances in Applied Mathematics 1 Science of Computer Programming 1 Parallel Computing 1 Constructive Approximation 1 Statistical Science 1 New Generation Computing 1 Algorithmica 1 Computational Mechanics 1 Asia-Pacific Journal of Operational Research 1 Journal of Parallel and Distributed Computing 1 Neural Networks 1 International Journal of Foundations of Computer Science 1 Journal of Global Optimization 1 Computational Statistics 1 Mathematical Programming. Series A. Series B 1 International Journal of Robust and Nonlinear Control 1 Foundations of Computing and Decision Sciences 1 Journal of Nonlinear Science 1 Cybernetics and Systems Analysis 1 Formal Methods in System Design 1 Applied and Computational Harmonic Analysis 1 Statistica Sinica 1 Journal of Scheduling 1 Journal of the ACM 1 Discrete Dynamics in Nature and Society 1 Journal of Applied Statistics 1 Journal of Discrete Mathematical Sciences & Cryptography 1 Probability in the Engineering and Informational Sciences 1 RAIRO. Operations Research 1 International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 1 Archives of Computational Methods in Engineering 1 Theory and Practice of Logic Programming 1 4OR 1 ACM Journal of Experimental Algorithmics 1 Internet Mathematics 1 International Journal of Wavelets, Multiresolution and Information Processing 1 Parallel Processing Letters 1 Advances in Data Analysis and Classification. ADAC 1 Electronic Journal of Statistics 1 The Annals of Applied Statistics 1 Vestnik Yuzhno-Ural’skogo Gosudarstvennogo Universiteta. Seriya Matematicheskoe Modelirovanie i Programmirovanie 1 Journal of Algorithms & Computational Technology 1 Sankhyā. Series A 1 ACM Transactions on Algorithms 1 Information and Inference 1 Frontiers of Computer Science 1 Bayesian Analysis 1 EURO Journal on Computational Optimization ...and 4 more Serials all top 5 Cited in 20 Fields 162 Computer science (68-XX) 47 Operations research, mathematical programming (90-XX) 44 Statistics (62-XX) 21 Combinatorics (05-XX) 15 Numerical analysis (65-XX) 12 Probability theory and stochastic processes (60-XX) 11 Game theory, economics, finance, and other social and behavioral sciences (91-XX) 7 Mathematical logic and foundations (03-XX) 4 Information and communication theory, circuits (94-XX) 3 Systems theory; control (93-XX) 2 Linear and multilinear algebra; matrix theory (15-XX) 2 Biology and other natural sciences (92-XX) 1 History and biography (01-XX) 1 General algebraic systems (08-XX) 1 Partial differential equations (35-XX) 1 Dynamical systems and ergodic theory (37-XX) 1 Approximations and expansions (41-XX) 1 Operator theory (47-XX) 1 Fluid mechanics (76-XX) 1 Relativity and gravitational theory (83-XX) Citations by Year