UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase

Alistair MacDougall , Vladimir Volynkin , Rabie Saidi , Alistair MacDougall , Vladimir Volynkin , Rabie Saidi , Diego Poggioli , Hermann Zellner , Emma Hatton-Ellis , Vishal Joshi , Claire O’Donovan , Sandra Orchard , Andrea H Auchincloss , Delphine Baratin , Jerven Bolleman , Elisabeth Coudert , Leyla Jael Castro , Chantal Hulo , Patrick Masson , Ivo Pedruzzi , Catherine Rivoire , Cecilia Arighi , Qinghua Wang , Chuming Chen , Hongzhan Huang , John S. Garavelli , C R Vinayaka , Lai-Su Yeh , Darren A. Natale , Kati Laiho , María Martin , Alexandre Renaux , Klemens Pichler , Alex Bateman , Alan Bridge , Cathy Wu , Lionel Breuza , Elisabeth Coudert , Hongzhan Huang , Damien Lieberherr , Michele Magrane , María Martin , Peter B. McGarvey , Darren A. Natale , Sandra Orchard , Ivo Pedruzzi , Sylvain Poux , Manuela Pruess , Shriya Raj , Nicole Redaschi , Lucila Aimo , Ghislaine Argoud‐Puy , Andrea H Auchincloss , Kristian B. Axelsen , Emmanuel Boutet , Emily Bowler-Barnett , Ramona Britto , Hema Bye‐A‐Jee , Cristina Casals‐Casas , Paul Denny , Anne Estreicher , Maria Livia Famiglietti , Marc Feuermann , John S. Garavelli , Penelope Garmiri , Arnaud Gos , Nadine Gruaz , Emma Hatton-Ellis , Chantal Hulo , Nevila Hyka‐Nouspikel , Florence Jungo , Kati Laiho , Philippe Le Mercier , Antonia Lock , Yvonne Lussi , Alistair MacDougall , Patrick Masson , Anne Morgat , Sandrine Pilbout , Lucille Pourcel , Catherine Rivoire , Karen Ross , Christian J A Sigrist , Elena Speretta , Shyamala Sundaram , Nidhi Tyagi , C R Vinayaka , Qinghua Wang , Kate Warner , Lai-Su Yeh , Rossana Zaru , Shadab Ahmed , Emanuele Alpi , Leslie Arminski , Parit Bansal , Delphine Baratin , Teresa Batista Neto , Jerven Bolleman , Chuming Chen , Chuming Chen , Beatrice Cuche , Austra Cukura , Leyla Jael Castro
2020 Bioinformatics 66 citations

Abstract

Abstract Motivation The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. Results In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. Availability and implementation UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.

Keywords

UniProtAnnotationComputer scienceResource (disambiguation)Information retrievalWorld Wide WebArtificial intelligenceBiology

Affiliated Institutions

Related Publications

Publication Info

Year
2020
Type
article
Volume
36
Issue
17
Pages
4643-4648
Citations
66
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

66
OpenAlex

Cite This

Alistair MacDougall, Vladimir Volynkin, Rabie Saidi et al. (2020). UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase. Bioinformatics , 36 (17) , 4643-4648. https://doi.org/10.1093/bioinformatics/btaa485

Identifiers

DOI
10.1093/bioinformatics/btaa485