Abstract

We describe the timely dataflow model for distributed computation and its implementation in the Naiad system. The model supports stateful iterative and incremental computations. It enables both low-latency stream processing and high-throughput batch processing, using a new approach to coordination that combines asynchronous and fine-grained synchronous execution. We describe two of the programming frameworks built on Naiad: GraphLINQ for parallel graph processing, and differential dataflow for nested iterative and incremental computations. We show that a general-purpose system can achieve performance that matches, and sometimes exceeds, that of specialized systems.

Keywords

DataflowComputer scienceStream processingDataflow architectureParallel computingAsynchronous communicationComputationStateful firewallDistributed computingExecution modelLatency (audio)Model of computationGraphProgramming languageTheoretical computer scienceComputer network

Affiliated Institutions

Related Publications

Dryad

Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications. A Dryad application combines computational "vertices" with communication "ch...

2007 2446 citations

Publication Info

Year
2016
Type
article
Volume
59
Issue
10
Pages
75-83
Citations
36
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

36
OpenAlex

Cite This

Derek G. Murray, Frank McSherry, Michael Isard et al. (2016). Incremental, iterative data processing with timely dataflow. Communications of the ACM , 59 (10) , 75-83. https://doi.org/10.1145/2983551

Identifiers

DOI
10.1145/2983551