Abstract
The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While there have been a variety of methods to measure interrater reliability, traditionally it was measured as percent agreement, calculated as the number of agreement scores divided by the total number of scores. In 1960, Jacob Cohen critiqued use of percent agreement due to its inability to account for chance agreement. He introduced the Cohen's kappa, developed to account for the possibility that raters actually guess on at least some variables due to uncertainty. Like most correlation statistics, the kappa can range from -1 to +1. While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations. Judgments about what level of kappa should be acceptable for health research are questioned. Cohen's suggested interpretation may be too lenient for health related studies because it implies that a score as low as 0.41 might be acceptable. Kappa and percent agreement are compared, and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.
Keywords
Affiliated Institutions
Related Publications
Observer Reliability and Agreement
Abstract The terms observer reliability and observer agreement represent different concepts. Reliability coefficients express the ability to differentiate between subjects. Agre...
The Measurement of Observer Agreement for Categorical Data
This paper presents a general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies. The procedure essentially invo...
A Reliability Coefficient for Maximum Likelihood Factor Analysis
Maximum likelihood factor analysis provides an effective method for estimation of factor matrices and a useful test statistic in the likelihood ratio for rejection of overly sim...
Reliability: Consistency or Differentiating Among Subjects?
Because reliability is a prerequisite for validity, it is of interest to physical therapists. But what is reliability? Mitchell states, "A reliable instrument is one with small ...
Implementing a Five-Factor Personality Inventory for Use on the Internet
Abstract. A short five-factor personality inventory developed from the International Personality Item Pool (IPIP) was implemented as an online questionnaire and completed by 2,4...
Publication Info
- Year
- 2012
- Type
- article
- Volume
- 22
- Issue
- 3
- Pages
- 276-82
- Citations
- 9228
- Access
- Closed