Abstract

NCBI's Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 x 10(6) proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.

Keywords

RefSeqAnnotationBiologyGenomeEnsemblSequence (biology)Gene AnnotationComputational biologyGenome projectReference genomeFeature (linguistics)BioinformaticsGenomicsInformation retrievalGeneDatabaseGeneticsComputer science

Affiliated Institutions

Related Publications

Publication Info

Year
2008
Type
article
Volume
37
Issue
Database
Pages
D32-D36
Citations
740
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

740
OpenAlex

Cite This

Kim D. Pruitt, Tatiana Tatusova, William Klimke et al. (2008). NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Research , 37 (Database) , D32-D36. https://doi.org/10.1093/nar/gkn721

Identifiers

DOI
10.1093/nar/gkn721