Abstract

MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

Keywords

PetabyteComputer scienceComputationVariety (cybernetics)Programming paradigmFunction (biology)Parallel computingOperating systemDistributed computingBig dataArtificial intelligenceProgramming language

Affiliated Institutions

Related Publications

CloneCloud

Mobile applications are becoming increasingly ubiquitous and provide ever richer functionality on mobile devices. At the same time, such devices often enjoy strong connectivity ...

2011 1871 citations

Publication Info

Year
2008
Type
article
Volume
51
Issue
1
Pages
107-113
Citations
18309
Access
Closed

External Links

Social Impact

Altmetric
PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

18309
OpenAlex

Cite This

Jay B. Dean, Sanjay Ghemawat (2008). MapReduce. Communications of the ACM , 51 (1) , 107-113. https://doi.org/10.1145/1327452.1327492

Identifiers

DOI
10.1145/1327452.1327492