NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

Kim D. Pruitt; Tatiana Tatusova; Garth Brown; Donna Maglott

doi:10.1093/nar/gkr1079

Abstract

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16,00 organisms, 2.4 × 0(6) genomic records, 13 × 10(6) proteins and 2 × 10(6) RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).

Keywords

RefSeqAnnotationEnsemblBiologyGenome projectGenomeReference genomeSequence databaseDatabaseComputational biologyInformation retrievalBioinformaticsGeneticsGenomicsComputer scienceGene

Affiliated Institutions

Related Publications

NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

Kim D. Pruitt , Tatiana Tatusova , D. R. Maglott

NCBI's reference sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and protei...

2006 Nucleic Acids Research 4555 citations

RefSeq: an update on mammalian reference sequences

Kim D. Pruitt , Garth Brown , Susan M. Hiatt +26 more

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records deriv...

2013 Nucleic Acids Research 994 citations

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

Nuala A. O’Leary , Matt W. Wright , J. Rodney Brister +52 more

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein ...

2015 Nucleic Acids Research 6668 citations

Current status and new features of the Consensus Coding Sequence database

Catherine M. Farrell , Nuala A. O’Leary , Rachel Harte +40 more

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically a...

2013 Nucleic Acids Research 154 citations

NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

Kim D. Pruitt

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequen...

2004 Nucleic Acids Research 1622 citations

Publication Info

Year: 2011
Type: article
Volume: 40
Issue: D1
Pages: D130-D135
Citations: 1166
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1166

OpenAlex

Cite This

APA Style

                            
                                
                                    Kim D. Pruitt, 
                                
                                    Tatiana Tatusova, 
                                
                                    Garth Brown
                                
                                et al.
                            
                            (2011). 
                            NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. 
                            Nucleic Acids Research
                            , 40
                            (D1)
                            , D130-D135.
                            https://doi.org/10.1093/nar/gkr1079
                        

Identifiers

DOI: 10.1093/nar/gkr1079