Abstract

Hidden Markov Models (HMMs) are applied to the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated on the globin family, the protein kinase catalytic domain, and the EF-hand calcium binding motif. In each case the parameters of an HMM are estimated from a training set of unaligned sequences. After the HMM is built, it is used to obtain a multiple alignment of all the training sequences. It is also used to search the SWISS-PROT 22 database for other sequences that are members of the given protein family, or contain the given domain. The HMM produces multiple alignments of good quality that agree closely with the alignments produced by programs that incorporate three-dimensional structural information. When employed in discrimination tests (by examining how closely the sequences in a database fit the globin, kinase and EF-hand HMMs), the HMM is able to distinguish members of these families from non-members with a high degree of accuracy. Both the HMM and PROFILESEARCH (a technique used to search for relationships between a protein sequence and multiply aligned sequences) perform better in these tests than PROSITE (a dictionary of sites and patterns in proteins). The HMM appears to have a slight advantage over PROFILESEARCH in terms of lower rates of false negatives and false positives, even though the HMM is trained using only unaligned sequences, whereas PROFILESEARCH requires aligned training sequences. Our results suggest the presence of an EF-hand calcium binding motif in a highly conserved and evolutionary preserved putative intracellular region of 155 residues in the alpha-1 subunit of L-type calcium channels which play an important role in excitation-contraction coupling. This region has been suggested to contain the functional domains that are typical or essential for all L-type calcium channels regardless of whether they couple to ryanodine receptors, conduct ions or both.

Keywords

Hidden Markov modelFalse positive paradoxProtein familyGlobinSequence databaseSequence (biology)Pattern recognition (psychology)Computational biologyComputer scienceSet (abstract data type)Sequence alignmentArtificial intelligenceBiologyGeneticsPeptide sequenceGene

MeSH Terms

AlgorithmsAmino Acid SequenceAnimalsBinding SitesCalciumGlobinsHumansMarkov ChainsMolecular Sequence DataProtein KinasesProteinsSequence HomologyAmino Acid

Affiliated Institutions

Related Publications

Profile hidden Markov models.

Abstract The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-spe...

1998 Bioinformatics 5657 citations

Publication Info

Year
1994
Type
article
Volume
235
Issue
5
Pages
1501-1531
Citations
1934
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1934
OpenAlex
104
Influential
1320
CrossRef

Cite This

Anders Krogh, Michael Brown, Shahzad I. Mian et al. (1994). Hidden Markov Models in Computational Biology. Journal of Molecular Biology , 235 (5) , 1501-1531. https://doi.org/10.1006/jmbi.1994.1104

Identifiers

DOI
10.1006/jmbi.1994.1104
PMID
8107089

Data Quality

Data completeness: 81%