Abstract
An efficient means for generating mutation data matrices from large numbers of protein sequences is presented here. By means of an approximate peptide-based sequence comparison algorithm, the set sequences are clustered at the 85% identity level. The closest relating pairs of sequences are aligned, and observed amino acid exchanges tallied in a matrix. The raw mutation frequency matrix is processed in a similar way to that described by Dayhoff et al. (1978), and so the resulting matrices may be easily used in current sequence analysis applications, in place of the standard mutation data matrices, which have not been updated for 13 years. The method is fast enough to process the entire SWISS-PROT databank in 20 h on a Sun SPARCstation 1, and is fast enough to generate a matrix from a specific family or class of proteins in minutes. Differences observed between our 250 PAM mutation data matrix and the matrix calculated by Dayhoff et al. are briefly discussed.
Keywords
Related Publications
Amino acid substitution matrices from protein blocks.
Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The...
Performance evaluation of amino acid substitution matrices
Abstract Several choices of amino acid substitution matrices are currently available for searching and alignment applications. These choices were evaluated using the BLAST searc...
New scoring matrix for amino acid residue exchanges based on residue characteristic physical parameters
Based on residue characteristic physical parameters, a new scoring matrix, called EMPAR, for amino acid exchanges in proteins was obtained. When comparing protein sequences for ...
Construction of validated, non-redundant composite protein sequence databases
A strategy has been developed for the construction of a validated, comprehensive composite protein sequence database. Entries are amalgamated from primary source data bases by a...
An algorithm for secondary structure determination in proteins based on sequence similarity
A secondary structure prediction algorithm is proposed on the hypothesis that short homologous sequences of amino acids have the same secondary structure tendencies. Comparisons...
Publication Info
- Year
- 1992
- Type
- article
- Volume
- 8
- Issue
- 3
- Pages
- 275-282
- Citations
- 7003
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/8.3.275