Abstract
Abstract Motivation In RNA-seq differential expression analysis, investigators aim to detect those genes with changes in expression level across conditions, despite technical and biological variability in the observations. A common task is to accurately estimate the effect size, often in terms of a logarithmic fold change (LFC). Results When the read counts are low or highly variable, the maximum likelihood estimates for the LFCs has high variance, leading to large estimates not representative of true differences, and poor ranking of genes by effect size. One approach is to introduce filtering thresholds and pseudocounts to exclude or moderate estimated LFCs. Filtering may result in a loss of genes from the analysis with true differences in expression, while pseudocounts provide a limited solution that must be adapted per dataset. Here, we propose the use of a heavy-tailed Cauchy prior distribution for effect sizes, which avoids the use of filter thresholds or pseudocounts. The proposed method, Approximate Posterior Estimation for generalized linear model, apeglm, has lower bias than previously proposed shrinkage estimators, while still reducing variance for those genes with little information for statistical inference. Availability and implementation The apeglm package is available as an R/Bioconductor package at https://bioconductor.org/packages/apeglm, and the methods can be called from within the DESeq2 software. Supplementary information Supplementary data are available at Bioinformatics online.
Keywords
MeSH Terms
Affiliated Institutions
Related Publications
<i>destiny</i> : diffusion maps for large-scale single-cell data in R
Abstract Summary: Diffusion maps are a spectral method for non-linear dimension reduction and have recently been adapted for the visualization of single-cell expression data. He...
Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R
Abstract Motivation Single-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data f...
<tt>edgeR</tt> : a Bioconductor package for differential expression analysis of digital gene expression data
Abstract Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics a...
limma powers differential expression analyses for RNA-sequencing and microarray studies
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. limma is an R/Bioconductor software package that provides an integrated solution ...
Error filtering, pair assembly and error correction for next-generation sequencing reads
Abstract Motivation: Next-generation sequencing produces vast amounts of data with errors that are difficult to distinguish from true biological variation when coverage is low. ...
Publication Info
- Year
- 2018
- Type
- article
- Volume
- 35
- Issue
- 12
- Pages
- 2084-2092
- Citations
- 1978
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/bty895
- PMID
- 30395178
- PMCID
- PMC6581436