Article

A Comparison of Consensus, Consistency, and Measurement Approaches to Estimating Interrater Reliability

Author
  • Steven E. Stemler (Wesleyan University)

Abstract

This article argues that the general practice of describing interrater reliability as a single, unified concept is..at best imprecise, and at worst potentially misleading. Rather than representing a single concept, different..statistical methods for computing interrater reliability can be more accurately classified into one of three..categories based upon the underlying goals of analysis. The three general categories introduced and..described in this paper are: 1) consensus estimates, 2) consistency estimates, and 3) measurement estimates...The assumptions, interpretation, advantages, and disadvantages of estimates from each of these three..categories are discussed, along with several popular methods of computing interrater reliability coefficients..that fall under the umbrella of consensus, consistency, and measurement estimates. Researchers and..practitioners should be aware that different approaches to estimating interrater reliability carry with them..different implications for how ratings across multiple judges should be summarized, which may impact the..validity of subsequent study results. Accessed 123,170 times on https://pareonline.net from March 01, 2004 to December 31, 2019. For downloads from January 1, 2020 forward, please click on the PlumX Metrics link to the right.

Keywords: Interrater Reliability, Rating Scales, Scoring, Scoring Rubrics, Error of Measurement, Evaluation Methods, Evaluators, Examiners

How to Cite:

Stemler, S. E., (2004) “A Comparison of Consensus, Consistency, and Measurement Approaches to Estimating Interrater Reliability”, Practical Assessment, Research, and Evaluation 9(1): 4. doi: https://doi.org/10.7275/96jp-xz07

Downloads:
Download PDF
View PDF

1555 Views

208 Downloads