Abstract

Abstract Summary: UniProt Archive (UniParc) is the most comprehensive, non-redundant protein sequence database available. Its protein sequences are retrieved from predominant, publicly accessible resources. All new and updated protein sequences are collected and loaded daily into UniParc for full coverage. To avoid redundancy, each unique sequence is stored only once with a stable protein identifier, which can be used later in UniParc to identify the same protein in all source databases. When proteins are loaded into the database, database cross-references are created to link them to the origins of the sequences. As a result, performing a sequence search against UniParc is equivalent to performing the same search against all databases cross-referenced by UniParc. UniParc contains only protein sequences and database cross-references; all other information must be retrieved from the source databases. Availability: http://www.ebi.ac.uk/uniparc/

Keywords

UniProtIdentifierSequence databaseComputer scienceDatabaseProtein sequencingRedundancy (engineering)Sequence (biology)Information retrievalData miningPeptide sequenceBiologyProgramming language

Affiliated Institutions

Related Publications

Publication Info

Year
2004
Type
article
Volume
20
Issue
17
Pages
3236-3237
Citations
209
Access
Closed

External Links

Social Impact

Altmetric
PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

209
OpenAlex

Cite This

Rasko Leinonen, Federico Garcia Diez, David Binns et al. (2004). UniProt archive. Bioinformatics , 20 (17) , 3236-3237. https://doi.org/10.1093/bioinformatics/bth191

Identifiers

DOI
10.1093/bioinformatics/bth191