Abstract
Abstract Background An important challenge for transcript counting methods such as Serial Analysis of Gene Expression (SAGE), "Digital Northern" or Massively Parallel Signature Sequencing (MPSS), is to carry out statistical analyses that account for the within-class variability, i.e., variability due to the intrinsic biological differences among sampled individuals of the same class, and not only variability due to technical sampling error. Results We introduce a Bayesian model that accounts for the within-class variability by means of mixture distribution. We show that the previously available approaches of aggregation in pools ("pseudo-libraries") and the Beta-Binomial model, are particular cases of the mixture model. We illustrate our method with a brain tumor vs. normal comparison using SAGE data from public databases. We show examples of tags regarded as differentially expressed with high significance if the within-class variability is ignored, but clearly not so significant if one accounts for it. Conclusion Using available information about biological replicates, one can transform a list of candidate transcripts showing differential expression to a more reliable one. Our method is freely available, under GPL/GNU copyleft, through a user friendly web-based on-line tool or as R language scripts at supplemental web-site.
Keywords
MeSH Terms
Affiliated Institutions
Related Publications
Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study
Abstract Mixture modeling is a widely applied data analysis technique used to identify unobserved heterogeneity in a population. Despite mixture models' usefulness in practice, ...
Finite Mixture Modeling with Mixture Outcomes Using the EM Algorithm
Summary. This paper discusses the analysis of an extended finite mixture model where the latent classes corresponding to the mixture components for one set of observed variables...
Investigating population heterogeneity with factor mixture models.
Sources of population heterogeneity may or may not be observed. If the sources of heterogeneity are observed (e.g., gender), the sample can be split into groups and the data ana...
HTSeq—a Python framework to work with high-throughput sequencing data
Abstract Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from stand...
Fitting Discrete Probability Distributions to Evolutionary Events
The assumptions underlying the use of the Poisson distribution are essentially that the probability of an event is small but nearly identical for all occurrences and that the oc...
Publication Info
- Year
- 2004
- Type
- article
- Volume
- 5
- Issue
- 1
- Pages
- 119-119
- Citations
- 66
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1186/1471-2105-5-119
- PMID
- 15339345
- PMCID
- PMC517707