Abstract

We have developed a method for identifying fold families in the protein structure data bank. Pairwise sequence alignments are first performed to extract families of homologous proteins having 35% or more sequence identity. Representatives are selected with the best resolution and R-factor to give a nonhomologous data set. Subsequent structure comparisons between all members of this set detect homologous folds with low sequence identity but highly conserved structures. By softening the requirement on structural similarity, families of analogous proteins are obtained that have related folds but more diverse structures. Representatives are selected to give a non-analogous data set. Starting with 1410 chains from the Brookhaven Data Bank, we generate a set of 150 nonhomologous folds and a set of 112 non-analogous folds. Analysis of sequence and structure conservation within the larger families shows the globins to be the most highly conserved family and the TIM barrels the most weakly conserved.

Keywords

Computational biologyThreading (protein sequence)GeneticsProtein superfamilyFold (higher-order function)BiologyConserved sequenceSequence alignmentSequence (biology)Protein structurePeptide sequenceComputer scienceBiochemistryGene

Related Publications

Touring protein fold space with Dali/FSSP

The FSSP database and its new supplement, the Dali Domain Dictionary, present a continuously updated classification of all known 3D protein structures. The classification is der...

1998 Nucleic Acids Research 667 citations

Publication Info

Year
1993
Type
article
Volume
6
Issue
5
Pages
485-500
Citations
222
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

222
OpenAlex

Cite This

Christine Orengo, Tomas P. Flores, William R. Taylor et al. (1993). Identification and classification of protein fold families. Protein Engineering Design and Selection , 6 (5) , 485-500. https://doi.org/10.1093/protein/6.5.485

Identifiers

DOI
10.1093/protein/6.5.485