Abstract
The analysis of large-scale gene expression data is a fundamental approach to functional genomics and the identification of potential drug targets. Results derived from such studies cannot be trusted unless they are adequately designed and reported. The purpose of this study is to assess current practices on the reporting of experimental design and statistical analyses in gene expression-based studies. We reviewed hundreds of MEDLINE-indexed papers involving gene expression data analysis, which were published between 2003 and 2005. These papers were examined on the basis of their reporting of several factors, such as sample size, statistical power and software availability. Among the examined papers, we concentrated on 293 papers consisting of applications and new methodologies. These papers did not report approaches to sample size and statistical power estimation. Explicit statements on data transformation and descriptions of the normalisation techniques applied prior to data analyses (e.g. classification) were not reported in 57 (37.5%) and 104 (68.4%) of the methodology papers respectively. With regard to papers presenting biomedical-relevant applications, 41(29.1 %) of these papers did not report on data normalisation and 83 (58.9%) did not describe the normalisation technique applied. Clustering-based analysis, the t-test and ANOVA represent the most widely applied techniques in microarray data analysis. But remarkably, only 5 (3.5%) of the application papers included statements or references to assumption about variance homogeneity for the application of the t-test and ANOVA. There is still a need to promote the reporting of software packages applied or their availability. Recently-published gene expression data analysis studies may lack key information required for properly assessing their design quality and potential impact. There is a need for more rigorous reporting of important experimental factors such as statistical power and sample size, as well as the correct description and justification of statistical methods applied. This paper highlights the importance of defining a minimum set of information required for reporting on statistical design and analysis of expression data. By improving practices of statistical analysis reporting, the scientific community can facilitate quality assurance and peer-review processes, as well as the reproducibility of results.
Keywords
MeSH Terms
Affiliated Institutions
Related Publications
Design and Analysis of Cluster Randomization Trials in Health Research
Acknowledgements. Preface. 1. Introduction. 1.1 Why randomize clusters? 1.2 What is the impact of cluster randomization on the design and analysis of a trial? 1.3 Quantifying th...
Experimental Design and Data Analysis for Biologists
An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data. The text begins with a revision of e...
What about N? A methodological study of sample-size reporting in focus group studies
BackgroundFocus group studies are increasingly published in health related journals, but we know little about how researchers use this method, particularly how they determine th...
Statistical power analysis: a simple and general model for traditional and modern hypothesis tests
1. The Power of Statistical Tests. 2. A Simple and General Model for Power Analysis. 3. Power Analyses for Minimum-Effect Tests. 4. Using Power Analyses. 5. Correlation and Regr...
Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli.
Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The convention...
Publication Info
- Year
- 2006
- Type
- review
- Volume
- 6
- Issue
- 1
- Pages
- 27-27
- Citations
- 156
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1186/1472-6947-6-27
- PMID
- 16790051
- PMCID
- PMC1523197