Abstract

Abstract Motivation: A number of penalization and shrinkage approaches have been proposed for the analysis of microarray gene expression data. Similar techniques are now routinely applied to RNA sequence transcriptional count data, although the value of such shrinkage has not been conclusively established. If penalization is desired, the explicit modeling of mean–variance relationships provides a flexible testing regimen that ‘borrows’ information across genes, while easily incorporating design effects and additional covariates. Results: We describe BBSeq, which incorporates two approaches: (i) a simple beta-binomial generalized linear model, which has not been extensively tested for RNA-Seq data and (ii) an extension of an expression mean–variance modeling approach to RNA-Seq data, involving modeling of the overdispersion as a function of the mean. Our approaches are flexible, allowing for general handling of discrete experimental factors and continuous covariates. We report comparisons with other alternate methods to handle RNA-Seq data. Although penalized methods have advantages for very small sample sizes, the beta-binomial generalized linear model, combined with simple outlier detection and testing approaches, appears to have favorable characteristics in power and flexibility. Availability: An R package containing examples and sample datasets is available at http://www.bios.unc.edu/research/genomic_software/BBSeq Contact: yzhou@bios.unc.edu; fwright@bios.unc.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

Count dataCovariateNegative binomial distributionComputer scienceOverdispersionData miningOutlierSample size determinationGeneralized linear mixed modelMathematicsStatisticsArtificial intelligenceMachine learningPoisson distribution

Affiliated Institutions

Related Publications

Categorical Data Analysis

Preface. 1. Introduction: Distributions and Inference for Categorical Data. 1.1 Categorical Response Data. 1.2 Distributions for Categorical Data. 1.3 Statistical Inference for ...

2002 Wiley series in probability and stati... 6519 citations

Publication Info

Year
2011
Type
article
Volume
27
Issue
19
Pages
2672-2678
Citations
130
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

130
OpenAlex

Cite This

Yi‐Hui Zhou, Kai Xia, Fred A. Wright (2011). A powerful and flexible approach to the analysis of RNA sequence count data. Bioinformatics , 27 (19) , 2672-2678. https://doi.org/10.1093/bioinformatics/btr449

Identifiers

DOI
10.1093/bioinformatics/btr449