Recent advances in automated writing evaluation have enabled educators to use automated writing quality scores to improve assessment feasibility. However, there has been limited investigation of bias for automated writing quality scores with students from diverse racial or ethnic backgrounds. The use of biased scores could contribute to implementing unfair practices with negative consequences on student learning. The goal of this study was to investigate score bias of writeAlizer, a free and open-source automated writing evaluation program. For 421 students in Grades 4 and 7 who completed a state writing exam that included composition and multiple choice revising and editing questions, writeAlizer was used to generate automated writing quality scores for the composition section. Then, we used multiple regression models to investigate whether writeAlizer scores demonstrated differential predictions of the composition and overall scores on the state-mandated writing exam for students from different racial or ethnic groups. No evidence of bias for automated scores was observed. However, after controlling for automated scores in Grade 4, we found statistically significant group differences in regression models predicting overall state test scores 3 years later but not the essay composition scores. We hypothesize that the multiple choice revising and editing sections, rather than the scoring approach used for the essay portion, introduced constructirrelevant variance and might lead to differential performance among groups. Implications for assessment development and score use are discussed
Matta, M., Mercer, S., Keller-Margulis, M. (2023). Implications of Bias in Automated Writing Quality Scores for Fair and Equitable Assessment Decisions. SCHOOL PSYCHOLOGY, 38(3), 173-181 [10.1037/spq0000517].
Implications of Bias in Automated Writing Quality Scores for Fair and Equitable Assessment Decisions
Matta M.;
2023
Abstract
Recent advances in automated writing evaluation have enabled educators to use automated writing quality scores to improve assessment feasibility. However, there has been limited investigation of bias for automated writing quality scores with students from diverse racial or ethnic backgrounds. The use of biased scores could contribute to implementing unfair practices with negative consequences on student learning. The goal of this study was to investigate score bias of writeAlizer, a free and open-source automated writing evaluation program. For 421 students in Grades 4 and 7 who completed a state writing exam that included composition and multiple choice revising and editing questions, writeAlizer was used to generate automated writing quality scores for the composition section. Then, we used multiple regression models to investigate whether writeAlizer scores demonstrated differential predictions of the composition and overall scores on the state-mandated writing exam for students from different racial or ethnic groups. No evidence of bias for automated scores was observed. However, after controlling for automated scores in Grade 4, we found statistically significant group differences in regression models predicting overall state test scores 3 years later but not the essay composition scores. We hypothesize that the multiple choice revising and editing sections, rather than the scoring approach used for the essay portion, introduced constructirrelevant variance and might lead to differential performance among groups. Implications for assessment development and score use are discussedI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.