Abstract

Functioning as an "address tag" that directs nascent proteins to their proper cellular and extracellular locations, signal peptides have become a crucial tool in finding new drugs or reprogramming cells for gene therapy. To effectively and timely use such a tool, however, the first important thing is to develop an automated method for rapidly and accurately identifying the signal peptide for a given nascent protein. With the avalanche of new protein sequences generated in the post-genomic era, the challenge has become even more urgent and critical. In this paper, we have developed a novel method for predicting signal peptide sequences and their cleavage sites in human, plant, animal, eukaryotic, Gram-positive, and Gram-negative protein sequences, respectively. The new predictor is called Signal-3L that consists of three prediction engines working, respectively, for the following three progressively deepening layers: (1) identifying a query protein as secretory or non-secretory by an ensemble classifier formed by fusing many individual OET-KNN (optimized evidence-theoretic K nearest neighbor) classifiers operated in various dimensions of PseAA (pseudo amino acid) composition spaces; (2) selecting a set of candidates for the possible signal peptide cleavage sites of a query secretory protein by a subsite-coupled discrimination algorithm; (3) determining the final cleavage site by fusing the global sequence alignment outcome for each of the aforementioned candidates through a voting system. Signal-3L is featured by high success prediction rates with short computational time, and hence is particularly useful for the analysis of large-scale datasets. Signal-3L is freely available as a web-server at http://chou.med.harvard.edu/bioinf/Signal-3L/ or http://202.120.37.186/bioinf/Signal-3L, where, to further support the demand of the related areas, the signal peptides identified by Signal-3L for all the protein entries in Swiss-Prot databank that do not have signal peptide annotations or are annotated with uncertain terms but are classified by Signal-3L as secretory proteins are provided in a downloadable file. The large-scale file is prepared with Microsoft Excel and named "Tab-Signal-3L.xls", and will be updated once a year to include new protein entries and reflect the continuous development of Signal-3L.

Keywords

Signal peptidePseudo amino acid compositionComputer scienceSIGNAL (programming language)Classifier (UML)Protein sequencingCleavage (geology)Artificial intelligenceComputational biologyPattern recognition (psychology)Data miningPeptide sequenceMachine learningBiologyGeneBiochemistrySubcellular localization

MeSH Terms

AlgorithmsAmino Acid SequenceAmino AcidsDatabasesProteinInformation Storage and RetrievalMolecular Sequence DataProtein Sorting SignalsSequence AnalysisProteinSoftware

Affiliated Institutions

Related Publications

Publication Info

Year
2007
Type
article
Volume
363
Issue
2
Pages
297-303
Citations
237
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

237
OpenAlex
21
Influential
201
CrossRef

Cite This

Hong‐Bin Shen, Kuo‐Chen Chou (2007). Signal-3L: A 3-layer approach for predicting signal peptides. Biochemical and Biophysical Research Communications , 363 (2) , 297-303. https://doi.org/10.1016/j.bbrc.2007.08.140

Identifiers

DOI
10.1016/j.bbrc.2007.08.140
PMID
17880924

Data Quality

Data completeness: 81%