NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

Abstract

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

Keywords

RefSeqGenBankBiologySequence databaseEnsemblAnnotationGenomedbSNPDatabaseSequence (biology)Computational biologyReference genomeGenome projectBioinformaticsGeneticsGenomicsGeneComputer scienceSingle-nucleotide polymorphism

Affiliated Institutions

Related Publications

miRBase: from microRNA sequences to function

Ana Kozomara , Maria Birgaoanu , Sam Griffiths‐Jones

This FAIRsharing record describes: The miRBase database is a searchable database of published miRNA sequences and annotation. Each entry in miRBase represents a predicted hairpi...

2018 Nucleic Acids Research 4552 citations

High-Throughput Gene Mapping in <i>Caenorhabditis elegans</i>

Kathryn A. Swan , Damian E. Curtis , Kathleen B. McKusick +3 more

Positional cloning of mutations in model genetic systems is a powerful method for the identification of targets of medical and agricultural importance. To facilitate the high-th...

2002 Genome Research 626 citations

Archiving next generation sequencing data

Martin Shumway , Guy Cochrane , Hideaki Sugawara

Next generation sequencing platforms are producing biological sequencing data in unprecedented amounts. The partners of the International Nucleotide Sequencing Database Collabor...

2009 Nucleic Acids Research 125 citations

Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

Daehwan Kim , Joseph M. Paggi , Chanhee Park +2 more

2019 Nature Biotechnology 13675 citations

TarO: a target optimisation system for structural biology

Ian M. Overton , C. A. Johannes van Niekerk , Lester G. Carter +8 more

TarO (http://www.compbio.dundee.ac.uk/taro) offers a single point of reference for key bioinformatics analyses relevant to selecting proteins or domains for study by structural ...

2008 Nucleic Acids Research 114 citations

Publication Info

Year: 2004
Type: article
Volume: 33
Issue: Database issue
Pages: D501-D504
Citations: 1622
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1622

OpenAlex

Cite This

APA Style

                            
                                    Kim D. Pruitt
                                
                            (2004). 
                            NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. 
                            Nucleic Acids Research
                            , 33
                            (Database issue)
                            , D501-D504.
                            https://doi.org/10.1093/nar/gki025

Identifiers

DOI: 10.1093/nar/gki025