Article

From Investigating the Alignment of A Priori Item Characteristics Based on the CTT and Four-Parameter Logistic (4-PL) IRT Models to Further Exploring the Comparability of the Two Models

Authors
  • Agus Santoso
  • Heri Retnawati
  • Timbul Pardede
  • Ezi Apino
  • Ibnu Rafi
  • Munaya Rosyada
  • Gulzhaina Kassymova
  • Xu Wenxin

Abstract

The test blueprint is important in test development, where it guides the test item writer in creating test items according to the desired objectives and specifications or characteristics (so-called a priori item characteristics), such as the level of item difficulty in the category and the distribution of items based on their difficulty level. Given that the difficulty level of the test items (easy, medium, or hard) created is influenced by the perceptions, knowledge, and experience of the item writer, item analysis based on empirical data using a specific measurement framework needs to be conducted, in addition to evaluation based on expert judgment, to ensure that the test items and the test itself have appropriate characteristics. The present study investigated the extent to which the a priori characteristics (i.e., item difficulty) of the items of the Business English test taken by 4,836 Universitas Terbuka (UT) students aligned with their characteristics when estimated under classical test theory (CTT) and four-parameter logistic (4-PL) IRT models based on empirical data. In light of the two measurement models used, CTT and 4-PL, we extended this study to exploring the comparability of the two models based on the yielded item difficulty and discrimination estimates and the relationship between pseudo-guessing and carelessness parameters. Our study suggested insufficient support for asserting that the characteristics of the items used in the Business English test align with the characteristics expected by the test developers. The exploration of the comparability of the CTT and 4-PL models demonstrated that while the two models were comparable in terms of the item difficulty estimates yielded, they were not comparable for the item discrimination estimates. Our study also did not find a linear association of the pseudo-guessing and carelessness parameters estimated under the 4-PL model. Further findings of our study and their implications, especially on test development practices, are discussed. 

Keywords: carelessness, classical test theory, item characteristics, item response theory, pseudo-guessing

How to Cite:

Santoso, A., Retnawati, H., Pardede, T., Apino, E., Rafi, I., Rosyada, M., Kassymova, G. & Wenxin, X., (2024) “From Investigating the Alignment of A Priori Item Characteristics Based on the CTT and Four-Parameter Logistic (4-PL) IRT Models to Further Exploring the Comparability of the Two Models”, Practical Assessment, Research, and Evaluation 29(1): 14. doi: https://doi.org/10.7275/pare.2043

132 Views

36 Downloads

Published on
11 Nov 2024
Peer Reviewed