Article

Using Rater Cognition to Improve Generalizability of an Assessment of Scientific Argumentation

Authors
  • Katrina Borowiec
  • Courtney Castle

Abstract

Rater cognition or "think-aloud" studies have historically been used to enhance rater accuracy and consistency in writing and language assessments. As assessments are developed for new, complex constructs from the Next Generation Science Standards (NGSS), the present study illustrates the utility of extending "think-aloud" studies to science assessment. The study focuses on the development of rubrics for scientific argumentation, one of the NGSS Science and Engineering practices. The initial rubrics were modified based on cognitive interviews with five raters. Next, a group of four new raters scored responses using the original and revised rubrics. A psychometric analysis was conducted to measure change in interrater reliability, accuracy, and generalizability (using a generalizability study or "g-study") for the original and revised rubrics. Interrater reliability, accuracy, and generalizability increased with the rubric modifications. Furthermore, follow-up interviews with the second group of raters indicated that most raters preferred the revised rubric. These findings illustrate that cognitive interviews with raters can be used to enhance rubric usability and generalizability when assessing scientific argumentation, thereby improving assessment validity. Accessed 151 times on https://pareonline.net from October 12, 2019 to December 31, 2019. For downloads from January 1, 2020 forward, please click on the PlumX Metrics link to the right.

Keywords: Test Use, Test Construction, Test Format, Student Evaluation

How to Cite:

Borowiec, K. & Castle, C., (2019) “Using Rater Cognition to Improve Generalizability of an Assessment of Scientific Argumentation”, Practical Assessment, Research, and Evaluation 24(1): 8. doi: https://doi.org/10.7275/ey9d-p954

Downloads:
Download PDF
View PDF

155 Views

75 Downloads

Published on
02 Nov 2019