Abstract

This paper describes a high performance sampling architecture for inference of latent topic models on a cluster of workstations. Our system is faster than previous work by over an order of magnitude and it is capable of dealing with hundreds of millions of documents and thousands of topics. The algorithm relies on a novel communication structure, namely the use of a distributed (key, value) storage for synchronizing the sampler state between computers. Our architecture entirely obviates the need for separate computation and synchronization phases. Instead, disk, CPU, and network are used simultaneously to achieve high performance. We show that this architecture is entirely general and that it can be extended easily to more sophisticated latent variable models such as n-grams and hierarchies.

Keywords

Computer scienceSynchronizingArchitectureComputationSynchronization (alternating current)Distributed computingInferenceWorkstationKey (lock)Parallel computingTheoretical computer scienceArtificial intelligenceAlgorithmOperating systemComputer network

Affiliated Institutions

Related Publications

Publication Info

Year
2010
Type
article
Volume
3
Issue
1-2
Pages
703-710
Citations
424
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

424
OpenAlex

Cite This

Alexander J. Smola, Shravan Narayanamurthy (2010). An architecture for parallel topic models. Proceedings of the VLDB Endowment , 3 (1-2) , 703-710. https://doi.org/10.14778/1920841.1920931

Identifiers

DOI
10.14778/1920841.1920931