Abstract
The goal of this work is to simplify parallel application development, and thus ease the learning barriers faced by non-experts. It is especially useful where there is little data-parallelism to be recognized by a compiler. The applications programmer need learn the intricacies of only one primary subroutine in order to get the full benefits of the parallel interface. The applications programmer defines a high level concept, the task, that depends only on his application, and not on any particular parallel library. The task is defined by its three phases: (a) the task input, (b) sequential code to execute the task, and (c) any modifications of global variables that occur as a result of the task. In particular, side effects (which change global variable values) must not occur in phase (b). Forcing the user to re-organize his computation in these terms allows us to present the applications programmer with a single global environment visible to all processors (whether on a SMP or a NOW architecture), in the context of a master-slave architecture. Both a shared memory implementation (running on an SGI or SUN Solaris architecture) and a NOW memory implementation (running on top of MPI) are described. The implementations were tested by a naive program for integer factorization, and by a more sophisticated Todd-Coxeter coset enumeration. Integer factorization was chosen so as to exercise the major features of TOP-C in an unambiguous context.
Keywords
Affiliated Institutions
Related Publications
Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors
The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. ...
Parallel Distributed Processing
What makes people smarter than computers? These volumes by a pioneering neurocomputing group suggest that the answer lies in the massively parallel architecture of the human min...
A fast, lock-free approach for efficient parallel counting of occurrences of <i>k</i> -mers
Abstract Motivation: Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome as...
Glove: Global Vectors for Word Representation
Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the o...
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150...
Publication Info
- Year
- 1996
- Type
- article
- Volume
- 8
- Pages
- 141-150
- Citations
- 28
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1109/hpdc.1996.546183