Abstract
Abstract Motivation: A common difficulty in large-scale microarray studies is the presence of confounding factors, which may significantly skew estimates of statistical significance, cause unreliable feature selection and high false negative rates. To deal with these difficulties, an algorithmic framework known as Surrogate Variable Analysis (SVA) was recently proposed. Results: Based on the notion that data can be viewed as an interference pattern, reflecting the superposition of independent effects and random noise, we present a modified SVA, called Independent Surrogate Variable Analysis (ISVA), to identify features correlating with a phenotype of interest in the presence of potential confounding factors. Using simulated data, we show that ISVA performs well in identifying confounders as well as outperforming methods which do not adjust for confounding. Using four large-scale Illumina Infinium DNA methylation datasets subject to low signal to noise ratios and substantial confounding by beadchip effects and variable bisulfite conversion efficiency, we show that ISVA improves the identifiability of confounders and that this enables a framework for feature selection that is more robust to model misspecification and heterogeneous phenotypes. Finally, we demonstrate similar improvements of ISVA across four mRNA expression datasets. Thus, ISVA should be useful as a feature selection tool in studies that are subject to confounding. Availability: An R-package isva is available from www.cran.r-project.org. Contact: a.teschendorff@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Keywords
Affiliated Institutions
Related Publications
Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis
It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measur...
Mediation Analysis
Abstract Mediating variables are prominent in psychological theory and research. A mediating variable transmits the effect of an independent variable on a dependent variable. Di...
Principles of confounder selection
Abstract Selecting an appropriate set of confounders for which to control is critical for reliable causal inference. Recent theoretical and methodological developments have help...
Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey
Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer...
X-Vectors: Robust DNN Embeddings for Speaker Recognition
In this paper, we use data augmentation to improve performance of deep neural network (DNN) embeddings for speaker recognition. The DNN, which is trained to discriminate between...
Publication Info
- Year
- 2011
- Type
- article
- Volume
- 27
- Issue
- 11
- Pages
- 1496-1505
- Citations
- 269
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/btr171