Abstract

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16,00 organisms, 2.4 × 0(6) genomic records, 13 × 10(6) proteins and 2 × 10(6) RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).

Keywords

RefSeqAnnotationEnsemblBiologyGenome projectGenomeReference genomeSequence databaseDatabaseComputational biologyInformation retrievalBioinformaticsGeneticsGenomicsComputer scienceGene

Affiliated Institutions

Related Publications

Publication Info

Year
2011
Type
article
Volume
40
Issue
D1
Pages
D130-D135
Citations
1166
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1166
OpenAlex

Cite This

Kim D. Pruitt, Tatiana Tatusova, Garth Brown et al. (2011). NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Research , 40 (D1) , D130-D135. https://doi.org/10.1093/nar/gkr1079

Identifiers

DOI
10.1093/nar/gkr1079