Abstract

Correct prediction of the structure of protein-coding genes of higher eukaryotes is still a difficult task; therefore, public databases are heavily contaminated with mispredicted sequences. The high rate of misprediction has serious consequences because it significantly affects the conclusions that may be drawn from genome-scale sequence analyses of eukaryotic genomes. Here we present the MisPred database and computational pipeline that provide efficient means for the identification of erroneous sequences in public databases. The MisPred database contains a collection of abnormal, incomplete and mispredicted protein sequences from 19 metazoan species identified as erroneous by MisPred quality control tools in the UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, NCBI/RefSeq and EnsEMBL databases. Major releases of the database are automatically generated and updated regularly. The database (http://www.mispred.com) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in a variety of formats. DATABASE URL: http://www.mispred.com.

Keywords

UniProtEnsemblRefSeqComputer scienceDatabaseIdentification (biology)Biological databaseSequence databaseInformation retrievalGenomeBioinformaticsBiologyGenomicsGene

Affiliated Institutions

Related Publications

NetAffx: Affymetrix probesets and annotations

NetAffx (http://www.affymetrix.com) details and annotates probesets on Affymetrix GeneChip microarrays. These annotations include (i) static information specific to the probeset...

2003 Nucleic Acids Research 486 citations

Publication Info

Year
2013
Type
article
Volume
2013
Pages
bat053-bat053
Citations
17
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

17
OpenAlex

Cite This

Alinda Nagy, László Patthy (2013). MisPred: a resource for identification of erroneous protein sequences in public databases. Database , 2013 , bat053-bat053. https://doi.org/10.1093/database/bat053

Identifiers

DOI
10.1093/database/bat053