Abstract

Abstract The Genome Sequence Archive (GSA) is a data repository for archiving raw sequence data, which provides data storage and sharing services for worldwide scientific communities. Considering explosive data growth with diverse data types, here we present the GSA family by expanding into a set of resources for raw data archive with different purposes, namely, GSA (https://ngdc.cncb.ac.cn/gsa/), GSA for Human (GSA-Human, https://ngdc.cncb.ac.cn/gsa-human/), and Open Archive for Miscellaneous Data (OMIX, https://ngdc.cncb.ac.cn/omix/). Compared with the 2017 version, GSA has been significantly updated in data model, online functionalities, and web interfaces. GSA-Human, as a new partner of GSA, is a data repository specialized in human genetics-related data with controlled access and security. OMIX, as a critical complement to the two resources mentioned above, is an open archive for miscellaneous data. Together, all these resources form a family of resources dedicated to archiving explosive data with diverse types, accepting data submissions from all over the world, and providing free open access to all publicly available data in support of worldwide research activities.

Keywords

Raw dataComputer scienceData sharingData setInformation repositoryData accessData scienceWorld Wide WebDatabaseComputer data storageMedicine

MeSH Terms

DatabasesGeneticExplosive AgentsGenomeHumanGenomicsHumansInformation Storage and Retrieval

Affiliated Institutions

Related Publications

Publication Info

Year
2021
Type
article
Volume
19
Issue
4
Pages
578-583
Citations
1662
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1662
OpenAlex
17
Influential
1346
CrossRef

Cite This

Tingting Chen, Xu Chen, Sisi Zhang et al. (2021). The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types. Genomics Proteomics & Bioinformatics , 19 (4) , 578-583. https://doi.org/10.1016/j.gpb.2021.08.001

Identifiers

DOI
10.1016/j.gpb.2021.08.001
PMID
34400360
PMCID
PMC9039563

Data Quality

Data completeness: 90%