Abstract
Abstract The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif‐encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix‐turn‐helix DNA‐binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215 :403–410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric β‐barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane‐spanning β‐strands. These β‐strands occur on the membrane interface (as opposed to the trimeric interface) of the β‐barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles.
Keywords
Affiliated Institutions
Related Publications
The structure of interfaces between subunits of dimeric and tetrameric proteins
The structures of the interfaces of nine dimeric and nine tetrameric proteins have been analyzed and have been seen to follow general principles. These interfaces are combinatio...
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions
Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence ...
HD-Zip proteins: members of an Arabidopsis homeodomain protein superfamily.
Homeobox genes encode a large family of homeodomain proteins in animal systems. To test whether such genes are also abundant in higher plants, degenerate oligonucleotides comple...
GAME: detecting <i>cis</i>-regulatory elements using a genetic algorithm
Abstract Motivation: Identification of a transcription factor binding sites is an important aspect of the analysis of genetic regulation. Many programs have been developed for t...
FANMOD: a tool for fast network motif detection
Abstract Summary: Motifs are small connected subnetworks that a network displays in significantly higher frequencies than would be expected for a random network. They have recen...
Publication Info
- Year
- 1995
- Type
- article
- Volume
- 4
- Issue
- 8
- Pages
- 1618-1632
- Citations
- 390
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1002/pro.5560040820