Article: Learning Technologies

Comparing the Quality of Human and ChatGPT Feedback of Students’ Writing


Derek’s Recommendation

Is it even worth trying to design an AI chatbot that provides students feedback on their work? Can generative AI give the kind of useful feedback that expert teachers give? This study shows that at least some of the time, if a chatbot is prompted well, the answer is "yes."

We found that human raters, at least the well-trained, paid, and relatively time-rich evaluators in this sample, provided higher quality feedback in four of five critical areas: clarity of directions for improvementaccuracyprioritization of essential features, and use of a supportive tone. The impressive skills of experienced and resourced human educators to provide quality formative feedback were notable. However, the most important takeaway of the study is not that expert humans performed better than ChatGPT—hardly a surprising finding—but rather that ChatGPT's feedback was relatively close to that of humans in quality without requiring any training. To our knowledge, no previous study has compared automated and human feedback on writing because the quality of automated feedback has been so poor that such a study would be futile. Presently, the small differences between the two modes of feedback suggest that feedback generated by ChatGPT can likely serve valuable instructional purposes, particularly in the early stages of writing to motivate revision work by students in a timely fashion.