Abstract
Abstract Motivation: A number of penalization and shrinkage approaches have been proposed for the analysis of microarray gene expression data. Similar techniques are now routinely applied to RNA sequence transcriptional count data, although the value of such shrinkage has not been conclusively established. If penalization is desired, the explicit modeling of mean–variance relationships provides a flexible testing regimen that ‘borrows’ information across genes, while easily incorporating design effects and additional covariates. Results: We describe BBSeq, which incorporates two approaches: (i) a simple beta-binomial generalized linear model, which has not been extensively tested for RNA-Seq data and (ii) an extension of an expression mean–variance modeling approach to RNA-Seq data, involving modeling of the overdispersion as a function of the mean. Our approaches are flexible, allowing for general handling of discrete experimental factors and continuous covariates. We report comparisons with other alternate methods to handle RNA-Seq data. Although penalized methods have advantages for very small sample sizes, the beta-binomial generalized linear model, combined with simple outlier detection and testing approaches, appears to have favorable characteristics in power and flexibility. Availability: An R package containing examples and sample datasets is available at http://www.bios.unc.edu/research/genomic_software/BBSeq Contact: yzhou@bios.unc.edu; fwright@bios.unc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Keywords
Affiliated Institutions
Related Publications
Approximate Inference in Generalized Linear Mixed Models
Statistical approaches to overdispersion, correlated errors, shrinkage estimation, and smoothing of regression relationships may be encompassed within the framework of the gener...
Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates
Abstract Next generation sequencing technology provides a powerful tool for measuring gene expression (mRNA) levels in the form of RNA-sequence data. Method development for iden...
Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences
Abstract Motivation In RNA-seq differential expression analysis, investigators aim to detect those genes with changes in expression level across conditions, despite technical an...
Categorical Data Analysis
Preface. 1. Introduction: Distributions and Inference for Categorical Data. 1.1 Categorical Response Data. 1.2 Distributions for Categorical Data. 1.3 Statistical Inference for ...
miRGator v3.0: a microRNA portal for deep sequencing, expression profiling and mRNA targeting
Biogenesis and molecular function are two key subjects in the field of microRNA (miRNA) research. Deep sequencing has become the principal technique in cataloging of miRNA reper...
Publication Info
- Year
- 2011
- Type
- article
- Volume
- 27
- Issue
- 19
- Pages
- 2672-2678
- Citations
- 130
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/btr449