How raters differ: A study of structured oral mathematics assessment
Abstract
This article explores where and why raters disagree in structured oral mathematics assessments. Based on Swedish data involving 74 student performances across three national upper-secondary oral test formats, six experienced raters evaluated student reasoning, communication, and method using shared rubrics. Despite structured tasks, raters showed substantial disagreement, especially in evaluating reasoning and communication. Multiple agreement measures and Svensson’s method were used to identify both systematic and unsystematic patterns of divergence. Raters also reported high confidence in their scoring, often misaligned with actual agreement. Using a typological framework for oral assessment, the study shows how the format structure and interaction influence scoring interpretation. These findings underscore the need for reliable assessment practices in systems increasingly focused on competencies and accountability. The article identifies four strategies to improve scoring consistency—enhanced rubrics, rater training, reflective tools, and collaborative assessment—and argues that reliable oral assessment is both important and achievable.
Keywords: Oral assessment, Inter-rater reliability, Mathematics Education, Rater judgement, Structured assessment formats
How to Cite:
Sollerman, S., (2026) “How raters differ: A study of structured oral mathematics assessment”, Practical Assessment, Research, and Evaluation 31(1): 2. doi: https://doi.org/10.7275/pare.3268
Downloads:
Download PDF
View PDF
204 Views
152 Downloads
