Abstract
Designing, conducting, analyzing, reporting, and interpreting the findings of a research study require an understanding of the types and characteristics of data and variables. Descriptive statistics are typically used simply to calculate, describe, and summarize the collected research data in a logical, meaningful, and efficient way. Inferential statistics allow researchers to make a valid estimate of the association between an intervention and the treatment effect in a specific population, based upon their randomly collected, representative sample data. Categorical data can be either dichotomous or polytomous. Dichotomous data have only 2 categories, and thus are considered binary. Polytomous data have more than 2 categories. Unlike dichotomous and polytomous data, ordinal data are rank ordered, typically based on a numerical scale that is comprised of a small set of discrete classes or integers. Continuous data are measured on a continuum and can have any numeric value over this continuous range. Continuous data can be meaningfully divided into smaller and smaller or finer and finer increments, depending upon the precision of the measurement instrument. Interval data are a form of continuous data in which equal intervals represent equal differences in the property being measured. Ratio data are another form of continuous data, which have the same properties as interval data, plus a true definition of an absolute zero point, and the ratios of the values on the measurement scale make sense. The normal (Gaussian) distribution (“bell-shaped curve”) is of the most common statistical distributions. Many applied inferential statistical tests are predicated on the assumption that the analyzed data follow a normal distribution. The histogram and the Q–Q plot are 2 graphical methods to assess if a set of data have a normal distribution (display “normality”). The Shapiro-Wilk test and the Kolmogorov-Smirnov test are 2 well-known and historically widely applied quantitative methods to assess for data normality. Parametric statistical tests make certain assumptions about the characteristics and/or parameters of the underlying population distribution upon which the test is based, whereas nonparametric tests make fewer or less rigorous assumptions. If the normality test concludes that the study data deviate significantly from a Gaussian distribution, rather than applying a less robust nonparametric test, the problem can potentially be remedied by judiciously and openly: (1) performing a data transformation of all the data values; or (2) eliminating any obvious data outlier(s).
Keywords
MeSH Terms
Affiliated Institutions
Related Publications
Correlation Coefficients: Appropriate Use and Interpretation
Correlation in the broadest sense is a measure of an association between variables. In correlated data, the change in the magnitude of 1 variable is associated with a change in ...
Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach
Methods of evaluating and comparing the performance of diagnostic tests are of increasing importance as new tests are developed and marketed. When a test is based on an observed...
An Analysis of Transformations
Summary In the analysis of data it is often assumed that observations y 1, y 2, …, yn are independently normally distributed with constant variance and with expectations specifi...
Multiple Tests for Different Sets of Variables Using a Data‐Driven Ordering of Hypotheses, with an Application to Gene Expression Data
A multiple parametric test procedure is proposed, which considers tests of means of several variables. The single variables or subsets of variables are ordered according to a da...
CONFRONTING MULTICOLLINEARITY IN ECOLOGICAL MULTIPLE REGRESSION
The natural complexity of ecological communities regularly lures ecologists to collect elaborate data sets in which confounding factors are often present. Although multiple regr...
Publication Info
- Year
- 2017
- Type
- review
- Volume
- 125
- Issue
- 4
- Pages
- 1375-1380
- Citations
- 155
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1213/ane.0000000000002370
- PMID
- 28787341