Significance, Errors, Power, and Sample Size: The Blocking and Tackling of Statistics

Abstract

Inferential statistics relies heavily on the central limit theorem and the related law of large numbers. According to the central limit theorem, regardless of the distribution of the source population, a sample estimate of that population will have a normal distribution, but only if the sample is large enough. The related law of large numbers holds that the central limit theorem is valid as random samples become large enough, usually defined as an n ≥ 30. In research-related hypothesis testing, the term “statistically significant” is used to describe when an observed difference or association has met a certain threshold. This significance threshold or cut-point is denoted as alpha ( α ) and is typically set at .05. When the observed P value is less than α, one rejects the null hypothesis (Ho) and accepts the alternative. Clinical significance is even more important than statistical significance, so treatment effect estimates and confidence intervals should be regularly reported. A type I error occurs when the Ho of no difference or no association is rejected, when in fact the Ho is true. A type II error occurs when the Ho is not rejected, when in fact there is a true population effect. Power is the probability of detecting a true difference, effect, or association if it truly exists. Sample size justification and power analysis are key elements of a study design. Ethical concerns arise when studies are poorly planned or underpowered. When calculating sample size for comparing groups, 4 quantities are needed: α , type II error, the difference or effect of interest, and the estimated variability of the outcome variable. Sample size increases for increasing variability and power, and for decreasing α and decreasing difference to detect. Sample size for a given relative reduction in proportions depends heavily on the proportion in the control group itself, and increases as the proportion decreases. Sample size for single-group studies estimating an unknown parameter is based on the desired precision of the estimate. Interim analyses assessing for efficacy and/or futility are great tools to save time and money, as well as allow science to progress faster, but are only 1 component considered when a decision to stop or continue a trial is made.

Keywords

Sample size determinationType I and type II errorsStatisticsNull hypothesisCentral limit theoremConfidence intervalPopulationStatistical powerSample (material)Statistical hypothesis testingStatistical significanceLimit (mathematics)Mathematicsp-valueRange (aeronautics)MedicineEconometricsPhysicsMathematical analysis

MeSH Terms

Data InterpretationStatisticalHumansNormal DistributionProbabilityResearch DesignSample Size

Affiliated Institutions

Related Publications

A study of the power associated with testing factor mean differences under violations of factorial invariance

David M. Kaplan , Rani Mary George

We examine the power associated with the test of factor mean differences when the assumption of factorial invariance is violated. Utilizing the Wald test for obtaining power, is...

1995 Structural Equation Modeling A Multid... 80 citations

A Simple Sequentially Rejective Multiple Test Procedure

Sture Holm

This paper presents a simple and widely ap- plicable multiple test procedure of the sequentially rejective type, i.e. hypotheses are rejected one at a tine until no further reje...

1979 Scandinavian Journal of Statistics 21731 citations

Effect size, confidence interval and statistical significance: a practical guide for biologists

Shinichi Nakagawa , Innes C. Cuthill

Abstract Null hypothesis significance testing (NHST) is the dominant statistical approach in biology, although it has many, frequently unappreciated, problems. Most importantly,...

2007 Biological reviews/Biological reviews... 3646 citations

A New Approach to the Problem of Multiple Comparisons in the Genetic Dissection of Complex Traits

J.I. Weller , Jiu Zhou Song , D.W. Heyen +2 more

Abstract Saturated genetic marker maps are being used to map individual genes affecting quantitative traits. Controlling the “experimentwise” type-I error severely lowers power ...

1998 Genetics 184 citations

Evaluating Structural Equation Models with Unobservable Variables and Measurement Error

Claes Fornell , David F. Larcker

The statistical tests used in the analysis of structural equation models with unobservable variables and measurement error are examined. A drawback of the commonly applied chi s...

1981 Journal of Marketing Research 61396 citations

Publication Info

Year: 2018
Type: article
Volume: 126
Issue: 2
Pages: 691-698
Citations: 205
Access: Closed

External Links

Download PDF (Free) View on DOI.org PubMed Semantic Scholar

Social Impact

Altmetric

Significance, Errors, Power, and Sample Size: The Blocking and Tackling of Statistics

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

205

OpenAlex

Influential

164

CrossRef

Cite This

APA Style

                            
                                    Edward J. Mascha, 
                                
                                    Thomas R. Vetter
                                
                            (2018). 
                            Significance, Errors, Power, and Sample Size: The Blocking and Tackling of Statistics. 
                            Anesthesia & Analgesia
                            , 126
                            (2)
                            , 691-698.
                            https://doi.org/10.1213/ane.0000000000002741

Identifiers

DOI: 10.1213/ane.0000000000002741
PMID: 29346210

Data Quality

Data completeness: 81%