Abstract

Abstract Summary: The accuracy of current signal peptide predictors is outstanding. The most successful predictors are based on neural networks and hidden Markov models, reaching a sensitivity of 99% and an accuracy of 95%. Here, we demonstrate that the popular BLASTP alignment tool can be tuned for signal peptide prediction reaching the same high level of prediction success. Alignment-based techniques provide additional benefits. In spite of high success rates signal peptide predictors yield false predictions. Simple sequences like polyvaline, for example, are predicted as signal peptides. The general architecture of learning systems makes it difficult to trace the cause of such problems. This kind of false predictions can be recognized or avoided altogether by using sequence comparison techniques. Based on these results we have implemented a public web service, called Signal-BLAST. Predictions returned by Signal-BLAST are transparent and easy to analyze. Availability: Signal-BLAST is available online at http://sigpep.services.came.sbg.ac.at/signalblast.html Contact: sippl@came.sbg.ac.at

Keywords

SIGNAL (programming language)Computer scienceHidden Markov modelArtificial intelligenceSequence (biology)Machine learningTRACE (psycholinguistics)Artificial neural networkSignal peptideData miningPattern recognition (psychology)Peptide sequenceBiology

MeSH Terms

Computational BiologyDatabasesProteinProtein Sorting SignalsSequence AlignmentSequence AnalysisProtein

Affiliated Institutions

Related Publications

Publication Info

Year
2008
Type
article
Volume
24
Issue
19
Pages
2172-2176
Citations
134
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

134
OpenAlex
10
Influential
119
CrossRef

Cite This

Karl H. Frank, Manfred J. Sippl (2008). High-performance signal peptide prediction based on sequence alignment techniques. Bioinformatics , 24 (19) , 2172-2176. https://doi.org/10.1093/bioinformatics/btn422

Identifiers

DOI
10.1093/bioinformatics/btn422
PMID
18697773

Data Quality

Data completeness: 86%