Abstract

Abstract Motivation: Profile HMMs are a powerful tool for modeling conserved motifs in proteins. These models are widely used by search tools to classify new protein sequences into families based on domain architecture. However, the proliferation of known motifs and new proteomic sequence data poses a computational challenge for search, requiring days of CPU time to annotate an organism's proteome. Results: We use PROSITE-like patterns as a filter to speed up the comparison between protein sequence and profile HMM. A set of patterns is designed starting from the HMM, and only sequences matching one of these patterns are compared to the HMM by full dynamic programming. We give an algorithm to design patterns with maximal sensitivity subject to a bound on the false positive rate. Experiments show that our patterns typically retain at least 90% of the sensitivity of the source HMM while accelerating search by an order of magnitude. Availability: Contact the first author at the address below. Contact: yanni@cse.wustl.edu

Keywords

Hidden Markov modelComputer scienceArtificial intelligenceData mining

MeSH Terms

AlgorithmsAmino Acid MotifsAmino Acid SequenceArtificial IntelligenceComputer SimulationConserved SequenceMarkov ChainsModelsChemicalModelsStatisticalMolecular Sequence DataPattern RecognitionAutomatedSequence AlignmentSequence AnalysisProteinSequence HomologyAmino Acid

Affiliated Institutions

Related Publications

Profile hidden Markov models.

Abstract The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-spe...

1998 Bioinformatics 5657 citations

Publication Info

Year
2007
Type
article
Volume
23
Issue
2
Pages
e36-e43
Citations
42
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

42
OpenAlex
2
Influential
21
CrossRef

Cite This

Yanni Sun, Jeremy Buhler (2007). Designing patterns for profile HMM search. Bioinformatics , 23 (2) , e36-e43. https://doi.org/10.1093/bioinformatics/btl323

Identifiers

DOI
10.1093/bioinformatics/btl323
PMID
17237102

Data Quality

Data completeness: 90%