Abstract

The entire protein sequence database has been exhaustively matched. Definitive mutation matrices and models for scoring gaps were obtained from the matching and used to organize the sequence database as sets of evolutionarily connected components. The methods developed are general and can be used to manage sequence data generated by major genome sequencing projects. The alignments made possible by the exhaustive matching are the starting point for successful de novo prediction of the folded structures of proteins, for reconstructing sequences of ancient proteins and metabolisms in ancient organisms, and for obtaining new perspectives in structural biochemistry.

Keywords

Sequence (biology)Matching (statistics)Computational biologyProtein sequencingSequence databaseGenomeComputer scienceSequence alignmentDNA sequencingBiologyData miningGeneticsPeptide sequenceDNAGeneMathematics

Related Publications

Publication Info

Year
1992
Type
article
Volume
256
Issue
5062
Pages
1443-1445
Citations
842
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

842
OpenAlex

Cite This

Gastón H. Gonnet, Mark A. Cohen, Steven A. Benner (1992). Exhaustive Matching of the Entire Protein Sequence Database. Science , 256 (5062) , 1443-1445. https://doi.org/10.1126/science.1604319

Identifiers

DOI
10.1126/science.1604319