MapReduce | RDL Research Database

Abstract

MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

Keywords

PetabyteComputer scienceComputationVariety (cybernetics)Programming paradigmFunction (biology)Parallel computingOperating systemDistributed computingBig dataArtificial intelligenceProgramming language

Affiliated Institutions

Google (United States) US

Related Publications

MapReduce: Simplified Data Processing on Large Cluster

Jay B. Dean , Sanjay Ghemawat

<p>Abstract - MapReduce is a data processing approach, where a single machine acts as a master, assigning map/reduce tasks to all the other machines attached in the cluste...

2018 INTERNATIONAL JOURNAL OF RESEARCH AND... 2972 citations

The Hadoop Distributed File System

Konstantin V. Shvachko , Hairong Kuang , Sanjay Radia +1 more

The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cl...

2010 4766 citations

Accelerating Parallel Maximum Likelihood-Based Phylogenetic Tree Calculations Using Subtree Equality Vectors

Alexandros Stamatakis , Thomas Ludwig , Harald Meier +1 more

Heuristics for calculating phylogenetic trees for a large sets of aligned rRNA sequences based on the maximum likelihood method are computationally expensive. The core of most p...

2002 Conference on High Performance Comput... 26 citations

CloneCloud

Byung-Gon Chun , Sunghwan Ihm , Petros Maniatis +2 more

Mobile applications are becoming increasingly ubiquitous and provide ever richer functionality on mobile devices. At the same time, such devices often enjoy strong connectivity ...

2011 1871 citations

GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit

Sander Pronk , Szilárd Páll , Roland Schulz +9 more

Abstract Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changi...

2013 Bioinformatics 7280 citations

Publication Info

Year: 2008
Type: article
Volume: 51
Issue: 1
Pages: 107-113
Citations: 18309
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

MapReduce

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

18309

OpenAlex

Cite This

APA Style

                            
                                    Jay B. Dean, 
                                
                                    Sanjay Ghemawat
                                
                            (2008). 
                            MapReduce. 
                            Communications of the ACM
                            , 51
                            (1)
                            , 107-113.
                            https://doi.org/10.1145/1327452.1327492

Identifiers

DOI: 10.1145/1327452.1327492