Scalability! but at what cost?
2015
218 citations
We offer a new metric for big data platforms, COST, or the Configuration that Outperforms a Single Thread. The COST of a given platform for a given problem is the hardware configuration required before the platform out-performs a competent single-threaded implementation. COST weighs a system’s scalability against the over-heads introduced by the system, and indicates the actual performance gains of the system, without rewarding sys-tems that bring substantial but parallelizable overheads. We survey measurements of data-parallel systems re-cently reported in SOSP and OSDI, and find that many systems have either a surprisingly large COST, often hundreds of cores, or simply underperform one thread for all of their reported configurations. 1
In recent years, the evolving of IoT (Internet of Things) has resulted in the deployment of massive numbers of sensors in various fields, such as the Energy and Utility (E&U) in...
Innovations in Next-Generation Sequencing are enabling generation of DNA sequence data at ever faster rates and at very low cost. For example, the Illumina NovaSeq 6000 sequence...
The Portable, Extensible Toolkit for Scientific Computation (PETSc), is a suite of data structures and routines for the scalable (parallel) solution of scientific applications m...
The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DN...
Large-scale deep learning requires huge computational resources to train a multi-layer neural network. Recent systems propose using 100s to 1000s of machines to train networks w...