A comparison of profile hidden Markov model procedures for remote homology detection

2002 Nucleic Acids Research 166 citations

Abstract

Profile hidden Markov models (HMMs) are amongst the most successful procedures for detecting remote homology between proteins. There are two popular profile HMM programs, HMMER and SAM. Little is known about their performance relative to each other and to the recently improved version of PSI-BLAST. Here we compare the two programs to each other and to non-HMM methods, to determine their relative performance and the features that are important for their success. The quality of the multiple sequence alignments used to build models was the most important factor affecting the overall performance of profile HMMs. The SAM T99 procedure is needed to produce high quality alignments automatically, and the lack of an equivalent component in HMMER makes it less complete as a package. Using the default options and parameters as would be expected of an inexpert user, it was found that from identical alignments SAM consistently produces better models than HMMER and that the relative performance of the model-scoring components varies. On average, HMMER was found to be between one and three times faster than SAM when searching databases larger than 2000 sequences, SAM being faster on smaller ones. Both methods were shown to have effective low complexity and repeat sequence masking using their null models, and the accuracy of their E-values was comparable. It was found that the SAM T99 iterative database search procedure performs better than the most recent version of PSI-BLAST, but that scoring of PSI-BLAST profiles is more than 30 times faster than scoring of SAM models.

Keywords

Hidden Markov modelMarkov chainComputer scienceMasking (illustration)Homology (biology)BiologySequence (biology)Markov modelMultiple sequence alignmentSequence alignmentPattern recognition (psychology)Artificial intelligenceMachine learningGeneticsGenePeptide sequence

MeSH Terms

Amino Acid SequenceComputational BiologyMarkov ChainsMolecular Sequence DataProteinsReproducibility of ResultsSequence Alignment

Affiliated Institutions

Related Publications

Accelerated Profile HMM Searches

Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, pr...

2011 PLoS Computational Biology 6891 citations

Profile hidden Markov models.

Abstract The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-spe...

1998 Bioinformatics 5657 citations

Publication Info

Year
2002
Type
article
Volume
30
Issue
19
Pages
4321-4328
Citations
166
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

166
OpenAlex
19
Influential
121
CrossRef

Cite This

Martin Madera (2002). A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Research , 30 (19) , 4321-4328. https://doi.org/10.1093/nar/gkf544

Identifiers

DOI
10.1093/nar/gkf544
PMID
12364612
PMCID
PMC140544

Data Quality

Data completeness: 86%