Abstract

The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!.

Keywords

Computer sciencePetabyteServerDistributed File SystemOperating systemDistributed data storeFile systemFile serverDistributed databaseDatabaseDistributed computingBig data

Affiliated Institutions

Related Publications

Publication Info

Year
2010
Type
article
Pages
1-10
Citations
4766
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

4766
OpenAlex

Cite This

Konstantin V. Shvachko, Hairong Kuang, Sanjay Radia et al. (2010). The Hadoop Distributed File System. , 1-10. https://doi.org/10.1109/msst.2010.5496972

Identifiers

DOI
10.1109/msst.2010.5496972