Abstract

Designing, conducting, analyzing, reporting, and interpreting the findings of a research study require an understanding of the types and characteristics of data and variables. Descriptive statistics are typically used simply to calculate, describe, and summarize the collected research data in a logical, meaningful, and efficient way. Inferential statistics allow researchers to make a valid estimate of the association between an intervention and the treatment effect in a specific population, based upon their randomly collected, representative sample data. Categorical data can be either dichotomous or polytomous. Dichotomous data have only 2 categories, and thus are considered binary. Polytomous data have more than 2 categories. Unlike dichotomous and polytomous data, ordinal data are rank ordered, typically based on a numerical scale that is comprised of a small set of discrete classes or integers. Continuous data are measured on a continuum and can have any numeric value over this continuous range. Continuous data can be meaningfully divided into smaller and smaller or finer and finer increments, depending upon the precision of the measurement instrument. Interval data are a form of continuous data in which equal intervals represent equal differences in the property being measured. Ratio data are another form of continuous data, which have the same properties as interval data, plus a true definition of an absolute zero point, and the ratios of the values on the measurement scale make sense. The normal (Gaussian) distribution (“bell-shaped curve”) is of the most common statistical distributions. Many applied inferential statistical tests are predicated on the assumption that the analyzed data follow a normal distribution. The histogram and the Q–Q plot are 2 graphical methods to assess if a set of data have a normal distribution (display “normality”). The Shapiro-Wilk test and the Kolmogorov-Smirnov test are 2 well-known and historically widely applied quantitative methods to assess for data normality. Parametric statistical tests make certain assumptions about the characteristics and/or parameters of the underlying population distribution upon which the test is based, whereas nonparametric tests make fewer or less rigorous assumptions. If the normality test concludes that the study data deviate significantly from a Gaussian distribution, rather than applying a less robust nonparametric test, the problem can potentially be remedied by judiciously and openly: (1) performing a data transformation of all the data values; or (2) eliminating any obvious data outlier(s).

Keywords

Categorical variablePolytomous Rasch modelOrdinal dataStatisticsBinary dataRange (aeronautics)Data setSample size determinationPopulationData miningMathematicsBinary numberComputer scienceMedicineItem response theoryPsychometrics

MeSH Terms

Biomedical ResearchData InterpretationStatisticalHumansNormal DistributionSample Size

Affiliated Institutions

Related Publications

An Analysis of Transformations

Summary In the analysis of data it is often assumed that observations y 1, y 2, …, yn are independently normally distributed with constant variance and with expectations specifi...

1964 Journal of the Royal Statistical Soci... 14698 citations

Publication Info

Year
2017
Type
review
Volume
125
Issue
4
Pages
1375-1380
Citations
155
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

155
OpenAlex
0
Influential
136
CrossRef

Cite This

Thomas R. Vetter (2017). Fundamentals of Research Data and Variables: The Devil Is in the Details. Anesthesia & Analgesia , 125 (4) , 1375-1380. https://doi.org/10.1213/ane.0000000000002370

Identifiers

DOI
10.1213/ane.0000000000002370
PMID
28787341

Data Quality

Data completeness: 81%