PyRAD: assembly of de novo RADseq loci for phylogenetic analyses

Deren A. R. Eaton

doi:10.1093/bioinformatics/btu121

Abstract

Abstract Motivation: Restriction-site–associated genomic markers are a powerful tool for investigating evolutionary questions at the population level, but are limited in their utility at deeper phylogenetic scales where fewer orthologous loci are typically recovered across disparate taxa. While this limitation stems in part from mutations to restriction recognition sites that disrupt data generation, an additional source of data loss comes from the failure to identify homology during bioinformatic analyses. Clustering methods that allow for lower similarity thresholds and the inclusion of indel variation will perform better at assembling RADseq loci at the phylogenetic scale. Results: PyRAD is a pipeline to assemble de novo RADseq loci with the aim of optimizing coverage across phylogenetic datasets. It uses a wrapper around an alignment-clustering algorithm, which allows for indel variation within and between samples, as well as for incomplete overlap among reads (e.g. paired-end). Here I compare PyRAD with the program Stacks in their performance analyzing a simulated RADseq dataset that includes indel variation. Indels disrupt clustering of homologous loci in Stacks but not in PyRAD , such that the latter recovers more shared loci across disparate taxa. I show through reanalysis of an empirical RADseq dataset that indels are a common feature of such data, even at shallow phylogenetic scales. PyRAD uses parallel processing as well as an optional hierarchical clustering method, which allows it to rapidly assemble phylogenetic datasets with hundreds of sampled individuals. Availability : Software is written in Python and freely available at http://www.dereneaton.com/software/ Contact: daeaton.chicago@gmail.com Supplementary Information: Supplementary data are available at Bioinformatics online.

Keywords

IndelPhylogenetic treeBiologyCluster analysisComputational biologyPopulationEvolutionary biologyGeneticsComputer scienceArtificial intelligenceGene

Affiliated Institutions

Related Publications

Stacks 2: Analytical methods for paired‐end sequencing improve RADseq‐based population genomics

Nicolas C. Rochette , Angel G. Rivera‐Colón , Julian Catchen

Abstract For half a century population genetics studies have put type II restriction endonucleases to work. Now, coupled with massively‐parallel, short‐read sequencing, the fami...

2019 Molecular Ecology 1142 citations

PHYLUCE is a software package for the analysis of conserved genomic loci

Brant C. Faircloth

Abstract Summary: Targeted enrichment of conserved and ultraconserved genomic elements allows universal collection of phylogenomic data from hundreds of species at multiple time...

2015 Bioinformatics 942 citations

Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences

Julian Catchen , Angel Amores , Paul A. Hohenlohe +2 more

Abstract Advances in sequencing technology provide special opportunities for genotyping individuals with speed and thrift, but the lack of software to automate the calling of te...

2011 G3 Genes Genomes Genetics 1966 citations

Oases:robustde novoRNA-seq assembly across the dynamic range of expression levels

Marcel H. Schulz , Daniel R. Zerbino , Martin Vingron +1 more

Abstract Motivation: High-throughput sequencing has made the analysis of new model organisms more affordable. Although assembling a new genome can still be costly and difficult,...

2012 Bioinformatics 1452 citations

CLEVER: clique-enumerating variant finder

Tobias Marschall , Ivan G. Costa , Stefan Canzar +4 more

Abstract Motivation: Next-generation sequencing techniques have facilitated a large-scale analysis of human genetic variation. Despite the advances in sequencing speed, the comp...

2012 Bioinformatics 114 citations

Publication Info

Year: 2014
Type: article
Volume: 30
Issue: 13
Pages: 1844-1849
Citations: 741
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

PyRAD: assembly of de novo RADseq loci for phylogenetic analyses

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

741

OpenAlex

Cite This

APA Style

                            
                                    Deren A. R. Eaton
                                
                            (2014). 
                            PyRAD: assembly of <i>de novo</i> RADseq loci for phylogenetic analyses. 
                            Bioinformatics
                            , 30
                            (13)
                            , 1844-1849.
                            https://doi.org/10.1093/bioinformatics/btu121

Identifiers

DOI: 10.1093/bioinformatics/btu121

PyRAD: assembly of <i>de novo</i> RADseq loci for phylogenetic analyses