Moderated statistical tests for assessing differences in tag abundance

Abstract

Abstract Motivation: Digital gene expression (DGE) technologies measure gene expression by counting sequence tags. They are sensitive technologies for measuring gene expression on a genomic scale, without the need for prior knowledge of the genome sequence. As the cost of sequencing DNA decreases, the number of DGE datasets is expected to grow dramatically. Various tests of differential expression have been proposed for replicated DGE data using binomial, Poisson, negative binomial or pseudo-likelihood (PL) models for the counts, but none of the these are usable when the number of replicates is very small. Results: We develop tests using the negative binomial distribution to model overdispersion relative to the Poisson, and use conditional weighted likelihood to moderate the level of overdispersion across genes. Not only is our strategy applicable even with the smallest number of libraries, but it also proves to be more powerful than previous strategies when more libraries are available. The methodology is equally applicable to other counting technologies, such as proteomic spectral counts. Availability: An R package can be accessed from http://bioinf.wehi.edu.au/resources/ Contact: smyth@wehi.edu.au Supplementary information: http://bioinf.wehi.edu.au/resources/

Keywords

OverdispersionNegative binomial distributionPoisson distributionCount dataBinomial distributionBiologyComputer scienceStatisticsComputational biologyMathematics

MeSH Terms

AlgorithmsComputer SimulationData InterpretationStatisticalExpressed Sequence TagsGene Expression ProfilingLikelihood FunctionsModelsGeneticModelsStatisticalPoisson DistributionSequence AnalysisDNASignal ProcessingComputer-Assisted

Affiliated Institutions

Related Publications

Fitting Discrete Probability Distributions to Evolutionary Events

Thomas Uzzell , Kendall W. Corbin

The assumptions underlying the use of the Poisson distribution are essentially that the probability of an event is small but nearly identical for all occurrences and that the oc...

1971 Science 268 citations

Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments

Gordon K. Smyth

The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior od...

2004 Statistical Applications in Genetics ... 11874 citations

Some Applications of Radial Plots

R. F. Galbraith

Abstract A radial plot is a graphical display for comparing estimates that have differing precisions. It is a scatter plot of standardized estimates against reciprocals of stand...

1994 Journal of the American Statistical A... 75 citations

GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses

Zefang Tang , Chenwei Li , Boxi Kang +3 more

Tremendous amount of RNA sequencing data have been produced by large consortium projects such as TCGA and GTEx, creating new opportunities for data mining and deeper understandi...

2017 Nucleic Acids Research 10191 citations

Expression Atlas update—an integrated database of gene and protein expression in humans, animals and plants

Robert Petryszak , Maria Keays , Amy Tang +21 more

Expression Atlas (http://www.ebi.ac.uk/gxa) provides information about gene and protein expression in animal and plant samples of different cell types, organism parts, developme...

2015 Nucleic Acids Research 548 citations

Publication Info

Year: 2007
Type: article
Volume: 23
Issue: 21
Pages: 2881-2887
Citations: 906
Access: Closed

External Links

Download PDF (Free) View on DOI.org PubMed Semantic Scholar

Social Impact

Altmetric

Moderated statistical tests for assessing differences in tag abundance

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

906

OpenAlex

Influential

734

CrossRef

Cite This

APA Style

                            
                                    Mark D. Robinson, 
                                
                                    Gordon K. Smyth
                                
                            (2007). 
                            Moderated statistical tests for assessing differences in tag abundance. 
                            Bioinformatics
                            , 23
                            (21)
                            , 2881-2887.
                            https://doi.org/10.1093/bioinformatics/btm453

Identifiers

DOI: 10.1093/bioinformatics/btm453
PMID: 17881408

Data Quality

Data completeness: 86%