Abstract

Abstract Motivation: Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins. Results: We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach. Availability and implementation: The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/. Contact: tdogan@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

UniProtAnnotationComputer scienceSimilarity (geometry)Domain (mathematical analysis)Set (abstract data type)Gene ontologyComputational biologyInformation retrievalData miningArtificial intelligenceGeneBiologyGenetics

MeSH Terms

Amino Acid SequenceDatabasesProteinKnowledge BasesMolecular Sequence AnnotationProteins

Affiliated Institutions

Related Publications

NetAffx: Affymetrix probesets and annotations

NetAffx (http://www.affymetrix.com) details and annotates probesets on Affymetrix GeneChip microarrays. These annotations include (i) static information specific to the probeset...

2003 Nucleic Acids Research 486 citations

Publication Info

Year
2016
Type
article
Volume
32
Issue
15
Pages
2264-2271
Citations
55
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

55
OpenAlex
2
Influential
40
CrossRef

Cite This

Tunca Doğan, Alistair MacDougall, Rabie Saidi et al. (2016). UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB. Bioinformatics , 32 (15) , 2264-2271. https://doi.org/10.1093/bioinformatics/btw114

Identifiers

DOI
10.1093/bioinformatics/btw114
PMID
27153729
PMCID
PMC4965628

Data Quality

Data completeness: 90%