Interrater reliability: the kappa statistic.

Abstract

The kappa statistic is frequently used to test interrater reliability. The importance of rater reliability lies in the fact that it represents the extent to which the data collected in the study are correct representations of the variables measured. Measurement of the extent to which data collectors (raters) assign the same score to the same variable is called interrater reliability. While there have been a variety of methods to measure interrater reliability, traditionally it was measured as percent agreement, calculated as the number of agreement scores divided by the total number of scores. In 1960, Jacob Cohen critiqued use of percent agreement due to its inability to account for chance agreement. He introduced the Cohen's kappa, developed to account for the possibility that raters actually guess on at least some variables due to uncertainty. Like most correlation statistics, the kappa can range from -1 to +1. While the kappa is one of the most commonly used statistics to test interrater reliability, it has limitations. Judgments about what level of kappa should be acceptable for health research are questioned. Cohen's suggested interpretation may be too lenient for health related studies because it implies that a score as low as 0.41 might be acceptable. Kappa and percent agreement are compared, and levels for both kappa and percent agreement that should be demanded in healthcare studies are suggested.

Keywords

Inter-rater reliabilityKappaCohen's kappaStatisticsStatisticReliability (semiconductor)PsychologyTest (biology)MathematicsMedicineRating scale

Affiliated Institutions

National University US

Related Publications

Observer Reliability and Agreement

Henrica C. W. de Vet

Abstract The terms observer reliability and observer agreement represent different concepts. Reliability coefficients express the ability to differentiate between subjects. Agre...

2005 Encyclopedia of Biostatistics 62 citations

The Measurement of Observer Agreement for Categorical Data

J. Richard Landis , Gary G. Koch

This paper presents a general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies. The procedure essentially invo...

1977 Biometrics 74977 citations

A Reliability Coefficient for Maximum Likelihood Factor Analysis

Ledyard R Tucker , Charles Lewis

Maximum likelihood factor analysis provides an effective method for estimation of factor matrices and a useful test statistic in the likelihood ratio for rejection of overly sim...

1973 Psychometrika 7099 citations

Reliability: Consistency or Differentiating Among Subjects?

Paul W. Stratford

Because reliability is a prerequisite for validity, it is of interest to physical therapists. But what is reliability? Mitchell states, "A reliable instrument is one with small ...

1989 Physical Therapy 84 citations

Implementing a Five-Factor Personality Inventory for Use on the Internet

Tom Buchanan , John A. Johnson , Lewis R. Goldberg

Abstract. A short five-factor personality inventory developed from the International Personality Item Pool (IPIP) was implemented as an online questionnaire and completed by 2,4...

2005 European Journal of Psychological Ass... 301 citations

Publication Info

Year: 2012
Type: article
Volume: 22
Issue: 3
Pages: 276-82
Citations: 9228
Access: Closed

External Links

Citation Metrics

9228

OpenAlex

Cite This

APA Style

                            
                                    Mary L. McHugh
                                
                            (2012). 
                            Interrater reliability: the kappa statistic.. 
                            PubMed
                            , 22
                            (3)
                            , 276-82.