Abstract
Abstract In 1885, Sir Francis Galton first defined the term "regression" and completed the theory of bivariate correlation. A decade later, Karl Pearson developed the index that we still use to measure correlation, Pearson's r. Our article is written in recognition of the 100th anniversary of Galton's first discussion of regression and correlation. We begin with a brief history. Then we present 13 different formulas, each of which represents a different computational and conceptual definition of r. Each formula suggests a different way of thinking about this index, from algebraic, geometric, and trigonometric settings. We show that Pearson's r (or simple functions of r) may variously be thought of as a special type of mean, a special type of variance, the ratio of two means, the ratio of two variances, the slope of a line, the cosine of an angle, and the tangent to an ellipse, and may be looked at from several other interesting perspectives.
Keywords
Affiliated Institutions
Related Publications
Collinearity: a review of methods to deal with it and a simulation study evaluating their performance
Collinearity refers to the non independence of predictor variables, usually in a regression‐type analysis. It is a common feature of any descriptive ecological data set and can ...
Correlation Coefficients: Appropriate Use and Interpretation
Correlation in the broadest sense is a measure of an association between variables. In correlated data, the change in the magnitude of 1 variable is associated with a change in ...
performance: An R Package for Assessment, Comparison and Testing of Statistical Models
A crucial part of statistical analysis is evaluating a model's quality and fit, or performance.During analysis, especially with regression models, investigating the fit of model...
Coefficient Alpha and the Internal Structure of Tests
A general formula ( α ) of which a special case is the Kuder-Richardson coefficient of equivalence is shown to be the mean of all split-half coefficients resulting from differen...
The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation
Regression analysis makes up a large part of supervised machine learning, and consists of the prediction of a continuous independent target from a set of other predictor variabl...
Publication Info
- Year
- 1988
- Type
- article
- Volume
- 42
- Issue
- 1
- Pages
- 59-66
- Citations
- 2886
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1080/00031305.1988.10475524