Abstract
The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences. Multiple motifs are found by fitting a mixture model to the data, probabilistically erasing the occurrences of the motif thus found, and repeating the process to find successive motifs. The algorithm requires only a set of unaligned sequences and a number specifying the width of the motifs as input. It returns a model of each motif and a threshold which together can be used as a Bayes-optimal classifier for searching for occurrences of the motif in other databases. The algorithm estimates how many times each motif occurs in each sequence in the dataset and outputs an alignment of the occurrences of the motif. The algorithm is capable of discovering several different motifs with differing numbers of occurrences in a single dataset.
Keywords
Affiliated Institutions
Related Publications
Hierarchical Mixtures of Experts and the EM Algorithm
We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture co...
A mixture of generalized hyperbolic distributions
Abstract We introduce a mixture of generalized hyperbolic distributions as an alternative to the ubiquitous mixture of Gaussian distributions as well as their near relatives wit...
Statistical approach to X-ray CT imaging and its applications in image analysis. II. A new stochastic model-based image segmentation technique for X-ray CT image
For pt.I, see ibid., vol.11, no.1, p.53.61 (1992). Based on the statistical properties of X-ray CT imaging given in pt.I, an unsupervised stochastic model-based image segmentati...
Inference on the Order of a Normal Mixture
Finite normal mixture models are used in a wide range of applications. Hypothesis testing on the order of the normal mixture is an important yet unsolved problem. Existing proce...
Using EM to Learn 3D Models of Indoor Environments with Mobile Robots
This paper describes an algorithm for generating compact 3D models of indoor environments with mobile robots. Our algorithm employs the expectation maximization algorithm to fit...
Publication Info
- Year
- 1994
- Type
- article
- Volume
- 2
- Pages
- 28-36
- Citations
- 5083
- Access
- Closed