Abstract

Abstract Motivation Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches. Results With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000–10 000 papers are curated in UniProt each year while curators evaluate 50 000–70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2–3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. Availability and implementation UniProt is freely available at http://www.uniprot.org/. Supplementary information Supplementary data are available at Bioinformatics online.

Keywords

UniProtComputer scienceData curationScope (computer science)World Wide WebData scienceBiology

MeSH Terms

Data CurationData MiningDatabasesProteinHumansKnowledge BasesPubMedReview Literature as TopicStatistics as Topic

Affiliated Institutions

Related Publications

Publication Info

Year
2017
Type
article
Volume
33
Issue
21
Pages
3454-3460
Citations
134
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

134
OpenAlex
8
Influential
111
CrossRef

Cite This

Sylvain Poux, Cecilia Arighi, Michele Magrane et al. (2017). On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics , 33 (21) , 3454-3460. https://doi.org/10.1093/bioinformatics/btx439

Identifiers

DOI
10.1093/bioinformatics/btx439
PMID
29036270
PMCID
PMC5860168

Data Quality

Data completeness: 90%