Abstract

<p>Abstract - MapReduce is a data processing approach, where a single machine acts as a master, assigning map/reduce tasks to all the other machines attached in the cluster. Technically, it could be considered as a programming model, which is applied in generating, implementation and generating large data sets. The key concept behind MapReduce is that the programmer is required to state the current problem in two basic functions, map and reduce. The scalability is handles within the system, rather than being handled by the concerned programmer. By applying various restrictions on the applied programming style, MapReduce performs several moderated functions such fault tolerance, locality optimization, load balancing as well as massive parallelization. Intermediate k/v pairs are generated by the Map, and then fed o the reduce workers by the use of the incorporated file system. The data received by the reduce workers is then merged using the same key, to produce multiple output file to the concerned user (Dean & Ghemawat, 2008). Additionally, the programmer is only required to master and write the codes regarding the easy to understand functionality.</p>

Keywords

Computer scienceScalabilityKey (lock)Distributed computingTerabyteProgramming paradigmScheduling (production processes)Function (biology)Set (abstract data type)Big dataParallel computingDatabaseOperating systemProgramming language

Affiliated Institutions

Related Publications

Optuna

The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API tha...

2019 5681 citations

Chord

A fundamental problem that confronts peer-to-peer applications is to efficiently locate the node that stores a particular data item. This paper presents Chord, a distributed loo...

2001 9645 citations

Publication Info

Year
2018
Type
article
Volume
5
Issue
5
Pages
399-403
Citations
2972
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2972
OpenAlex

Cite This

Jay B. Dean, Sanjay Ghemawat (2018). MapReduce: Simplified Data Processing on Large Cluster. INTERNATIONAL JOURNAL OF RESEARCH AND ENGINEERING , 5 (5) , 399-403. https://doi.org/10.21276/ijre.2018.5.5.4

Identifiers

DOI
10.21276/ijre.2018.5.5.4