Abstract
Data from expression arrays must be comparable before it can be analyzed rigorously on a large scale. Accurate normalization improves the comparability of expression data because it seeks to account for sources of variation obscuring the underlying variation of interest. Undesirable variation in reported expression levels originates in the preparation and hybridization of the sample as well as in the manufacture of the array itself, and may differ depending on the array technology being employed. Published research to date has not characterized the degree of variation associated with these sources, and results are often reported without tight statistical bounds on their significance. We analyze the distributions of reported levels of exogenous control species spiked into samples applied to 1280 Affymetrix arrays. We develop a model for explaining reported expression levels under an assumption of primarily multiplicative variation. To compute the scaling factors needed for normalization, we derive maximum likelihood and maximum a posteriori estimates for the parameters characterizing the multiplicative variation in reported spiked control expression levels. We conclude that the optimal scaling factors in this context are weighted geometric means and determine the appropriate weights. The optimal scaling factor estimates so computed can be used for subsequent array normalization.
Keywords
Affiliated Institutions
Related Publications
RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays
Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription fa...
Identification of a Novel Bat Papillomavirus by Metagenomics
<div><p>The discovery of novel viruses in animals expands our knowledge of viral diversity and potentially emerging zoonoses. High-throughput sequencing (HTS) techno...
Maximum Likelihood Estimation in Truncated Samples
In this paper we consider the problem of estimation of parameters from a sample in which only the first $r$ (of $n$) ordered observations are known. If $r = \\lbrack qn \\rbrack...
Star formation, metallicity and dust properties derived from the StromloAPM galaxy survey spectra
We have derived star formation rates (SFRs), gas-phase oxygen abundances and effective dust absorption optical depths for a sample of galaxies drawn from the Stromlo—APM redshif...
Gaussian regression and optimal finite dimensional linear models
The problem of regression under Gaussian assumptions is treated generally. The relationship between Bayesian prediction, regularization and smoothing is elucidated. The ideal re...
Publication Info
- Year
- 2001
- Type
- article
- Volume
- 4266
- Pages
- 132-140
- Citations
- 92
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1117/12.427981