Abstract
Functioning as an "address tag" that directs nascent proteins to their proper cellular and extracellular locations, signal peptides have become a crucial tool in finding new drugs or reprogramming cells for gene therapy. To effectively and timely use such a tool, however, the first important thing is to develop an automated method for rapidly and accurately identifying the signal peptide for a given nascent protein. With the avalanche of new protein sequences generated in the post-genomic era, the challenge has become even more urgent and critical. In this paper, we have developed a novel method for predicting signal peptide sequences and their cleavage sites in human, plant, animal, eukaryotic, Gram-positive, and Gram-negative protein sequences, respectively. The new predictor is called Signal-3L that consists of three prediction engines working, respectively, for the following three progressively deepening layers: (1) identifying a query protein as secretory or non-secretory by an ensemble classifier formed by fusing many individual OET-KNN (optimized evidence-theoretic K nearest neighbor) classifiers operated in various dimensions of PseAA (pseudo amino acid) composition spaces; (2) selecting a set of candidates for the possible signal peptide cleavage sites of a query secretory protein by a subsite-coupled discrimination algorithm; (3) determining the final cleavage site by fusing the global sequence alignment outcome for each of the aforementioned candidates through a voting system. Signal-3L is featured by high success prediction rates with short computational time, and hence is particularly useful for the analysis of large-scale datasets. Signal-3L is freely available as a web-server at http://chou.med.harvard.edu/bioinf/Signal-3L/ or http://202.120.37.186/bioinf/Signal-3L, where, to further support the demand of the related areas, the signal peptides identified by Signal-3L for all the protein entries in Swiss-Prot databank that do not have signal peptide annotations or are annotated with uncertain terms but are classified by Signal-3L as secretory proteins are provided in a downloadable file. The large-scale file is prepared with Microsoft Excel and named "Tab-Signal-3L.xls", and will be updated once a year to include new protein entries and reflect the continuous development of Signal-3L.
Keywords
MeSH Terms
Affiliated Institutions
Related Publications
VSEARCH: a versatile open source tool for metagenomics
Background VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence...
The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data
The Catalytic Site Atlas (CSA) provides catalytic residue annotation for enzymes in the Protein Data Bank. It is available online at http://www.ebi.ac.uk/thornton-srv/databases/...
QUANTIFICATION OF ANNEXIN I IN SUBCELLULAR FRACTIONS OF HUMAN NEUTROPHILS REVEALS AN EXCLUSIVE CYTOSOLIC LOCALISATION
Annexin I is an abundant cytosolic protein in human neutrophils. Besides its intracellular location, annexin I is found as an extracellular protein and the pathway for secretion...
Complete localization of the intrachain disulphide bonds and the <i>N</i>-glycosylation points in the α-subunit of human platelet glycoprotein IIb
Glycoprotein IIb (GPIIb), one of the two molecular components of the inducible receptor for fibrinogen on the platelet surface, is formed from two subunits, GPIIb alpha (114 kDa...
Pfam: The protein families database in 2021
Abstract The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new fam...
Publication Info
- Year
- 2007
- Type
- article
- Volume
- 363
- Issue
- 2
- Pages
- 297-303
- Citations
- 237
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1016/j.bbrc.2007.08.140
- PMID
- 17880924