NextPolish: a fast and efficient genome polishing tool for long-read assembly

Abstract

Abstract Motivation Although long-read sequencing technologies can produce genomes with long contiguity, they suffer from high error rates. Thus, we developed NextPolish, a tool that efficiently corrects sequence errors in genomes assembled with long reads. This new tool consists of two interlinked modules that are designed to score and count K-mers from high quality short reads, and to polish genome assemblies containing large numbers of base errors. Results When evaluated for the speed and efficiency using human and a plant (Arabidopsis thaliana) genomes, NextPolish outperformed Pilon by correcting sequence errors faster, and with a higher correction accuracy. Availability and implementation NextPolish is implemented in C and Python. The source code is available from https://github.com/Nextomics/NextPolish. Supplementary information Supplementary data are available at Bioinformatics online.

Keywords

Python (programming language)Computer scienceGenomeSequence assemblyk-merSource codeSoftwareContiguityComputational biologyBiologyProgramming languageGeneticsOperating system

Affiliated Institutions

Related Publications

Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies

Arang Rhie , Brian P. Walenz , Sergey Koren +1 more

Recent long-read assemblies often exceed the quality and completeness of available reference genomes, making validation challenging. Here we present Merqury, a novel tool for re...

2020 Genome biology 2863 citations

Identifying and removing haplotypic duplication in primary genome assemblies

Dengfeng Guan , Shane McCarthy , Jonathan Wood +3 more

Abstract Motivation Rapid development in long-read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic ge...

2020 Bioinformatics 2530 citations

COPE: an accurate <i>k</i>-mer-based pair-end reads connection tool to facilitate genome assembly

Binghang Liu , Jianying Yuan , Siu‐Ming Yiu +8 more

Abstract Motivation: The boost of next-generation sequencing technologies provides us with an unprecedented opportunity for elucidating genetic mysteries, yet the short-read len...

2012 Bioinformatics 145 citations

Nanopore sequencing and assembly of a human genome with ultra-long reads

Miten Jain , Sergey Koren , Karen H. Miga +23 more

We report the sequencing and assembly of a reference genome for the human GM12878 Utah/Ceph cell line using the MinION (Oxford Nanopore Technologies) nanopore sequencer. 91.2 Gb...

2018 Nature Biotechnology 1952 citations

Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology

Thomas D. Otto , Mandy Sanders , Matthew Berriman +1 more

Abstract Motivation: The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, w...

2010 Bioinformatics 222 citations

Publication Info

Year: 2019
Type: article
Volume: 36
Issue: 7
Pages: 2253-2255
Citations: 1183
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

NextPolish: a fast and efficient genome polishing tool for long-read assembly

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1183

OpenAlex

Cite This

APA Style

                            
                                
                                    Jiang Hu, 
                                
                                    Junpeng Fan, 
                                
                                    Zongyi Sun
                                
                                et al.
                            
                            (2019). 
                            NextPolish: a fast and efficient genome polishing tool for long-read assembly. 
                            Bioinformatics
                            , 36
                            (7)
                            , 2253-2255.
                            https://doi.org/10.1093/bioinformatics/btz891
                        

Identifiers

DOI: 10.1093/bioinformatics/btz891