Designing patterns for profile HMM search

Yanni Sun; Jeremy Buhler

doi:10.1093/bioinformatics/btl323

Abstract

Abstract Motivation: Profile HMMs are a powerful tool for modeling conserved motifs in proteins. These models are widely used by search tools to classify new protein sequences into families based on domain architecture. However, the proliferation of known motifs and new proteomic sequence data poses a computational challenge for search, requiring days of CPU time to annotate an organism's proteome. Results: We use PROSITE-like patterns as a filter to speed up the comparison between protein sequence and profile HMM. A set of patterns is designed starting from the HMM, and only sequences matching one of these patterns are compared to the HMM by full dynamic programming. We give an algorithm to design patterns with maximal sensitivity subject to a bound on the false positive rate. Experiments show that our patterns typically retain at least 90% of the sensitivity of the source HMM while accelerating search by an order of magnitude. Availability: Contact the first author at the address below. Contact: yanni@cse.wustl.edu

Keywords

Hidden Markov modelComputer scienceArtificial intelligenceData mining

MeSH Terms

AlgorithmsAmino Acid MotifsAmino Acid SequenceArtificial IntelligenceComputer SimulationConserved SequenceMarkov ChainsModelsChemicalModelsStatisticalMolecular Sequence DataPattern RecognitionAutomatedSequence AlignmentSequence AnalysisProteinSequence HomologyAmino Acid

Affiliated Institutions

Washington University in St. Louis US

Related Publications

Designing Patterns and Profiles for Faster HMM Search

Yanni Sun , Jeremy Buhler

Profile HMMs are powerful tools for modeling conserved motifs in proteins. They are widely used by search tools to classify new protein sequences into families based on domain a...

2008 IEEE/ACM Transactions on Computationa... 10 citations

Hidden Markov Models in Computational Biology

Kimmen Sjölander , David Haussler , Anders Krogh +2 more

Hidden Markov Models (HMMs) are applied to the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. Thes...

1994 Journal of Molecular Biology 1934 citations

Profile hidden Markov models.

Sean R. Eddy

Abstract The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-spe...

1998 Bioinformatics 5657 citations

A profile hidden Markov model for signal peptides generated by HMMER

Zemin Zhang , William I. Wood

Abstract Summary: Although the HMMER package is widely used to produce profile hidden Markov models (profile HMMs) for protein domains, it has been difficult to create a profile...

2003 Bioinformatics 113 citations

Protein homology detection by HMM–HMM comparison

Johannes Söding

Abstract Motivation: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. Results: We have gene...

2004 Bioinformatics 2470 citations

Publication Info

Year: 2007
Type: article
Volume: 23
Issue: 2
Pages: e36-e43
Citations: 42
Access: Closed

External Links

Download PDF (Free) View on DOI.org PubMed Semantic Scholar

Social Impact

Altmetric

Designing patterns for profile HMM search

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Influential

CrossRef

Cite This

APA Style

                            
                                    Yanni Sun, 
                                
                                    Jeremy Buhler
                                
                            (2007). 
                            Designing patterns for profile HMM search. 
                            Bioinformatics
                            , 23
                            (2)
                            , e36-e43.
                            https://doi.org/10.1093/bioinformatics/btl323

Identifiers

DOI: 10.1093/bioinformatics/btl323
PMID: 17237102

Data Quality

Data completeness: 90%