Phased diploid genome assembly with single-molecule real-time sequencing

2016 Nature Methods 2,138 citations

Abstract

While genome assembly projects have been successful in many haploid and inbred species, the assembly of noninbred or rearranged heterozygous genomes remains a major challenge. To address this challenge, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short- or long-read approaches. The phased diploid assembly enabled the study of haplotype structure and heterozygosities between homologous chromosomes, including the identification of widespread heterozygous structural variation within coding sequences.

Keywords

PloidyGenomeBiologyGeneticsHaplotypeSequence assemblyStructural variationComputational biologyGeneAllele

MeSH Terms

AlgorithmsArabidopsisBasidiomycotaDNAFungalDNAPlantDiploidyGenomeFungalGenomePlantGenomicsHaplotypesHeterozygoteHumansPolymorphismSingle NucleotideSequence AnalysisDNAVitis

Affiliated Institutions

Related Publications

Publication Info

Year
2016
Type
article
Volume
13
Issue
12
Pages
1050-1054
Citations
2138
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2138
OpenAlex

Cite This

Chen-Shan Chin, Paul Peluso, Fritz J. Sedlazeck et al. (2016). Phased diploid genome assembly with single-molecule real-time sequencing. Nature Methods , 13 (12) , 1050-1054. https://doi.org/10.1038/nmeth.4035

Identifiers

DOI
10.1038/nmeth.4035
PMID
27749838
PMCID
PMC5503144

Data Quality

Data completeness: 81%