The InterPro protein families and domains database: 20 years on

Abstract

Abstract The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.

Keywords

BiologyComputational biologyGeneticsBioinformaticsEvolutionary biologyDatabase

Affiliated Institutions

Related Publications

The Pfam protein families database

ROBERT FINN , John Tate , Jaina Mistry +8 more

Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam ...

2007 Nucleic Acids Research 1831 citations

Pfam: The protein families database in 2021

Jaina Mistry , Sara Chuguransky , Lowri Williams +9 more

Abstract The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new fam...

2020 Nucleic Acids Research 7049 citations

PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees

Simon Whelan

PANDIT is a database of homologous sequence alignments accompanied by estimates of their corresponding phylogenetic trees. It provides a valuable resource to those studying phyl...

2005 Nucleic Acids Research 70 citations

Expression Atlas update—an integrated database of gene and protein expression in humans, animals and plants

Robert Petryszak , Maria Keays , Amy Tang +21 more

Expression Atlas (http://www.ebi.ac.uk/gxa) provides information about gene and protein expression in animal and plant samples of different cell types, organism parts, developme...

2015 Nucleic Acids Research 548 citations

The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data

C. T. Porter

The Catalytic Site Atlas (CSA) provides catalytic residue annotation for enzymes in the Protein Data Bank. It is available online at http://www.ebi.ac.uk/thornton-srv/databases/...

2003 Nucleic Acids Research 608 citations

Publication Info

Year: 2020
Type: article
Volume: 49
Issue: D1
Pages: D344-D354
Citations: 2201
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

The InterPro protein families and domains database: 20 years on

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

2201

OpenAlex

Cite This

APA Style

                            
                                
                                    Matthias Blum, 
                                
                                    Hsin-Yu Chang, 
                                
                                    Sara Chuguransky
                                
                                et al.
                            
                            (2020). 
                            The InterPro protein families and domains database: 20 years on. 
                            Nucleic Acids Research
                            , 49
                            (D1)
                            , D344-D354.
                            https://doi.org/10.1093/nar/gkaa977
                        

Identifiers

DOI: 10.1093/nar/gkaa977