CDI 2024 participants in discussion
Collection

Assessing Student-AI Collaboration: Innovative Grading & Rubric Strategies

Generative AI has brought new urgency to longstanding assessment challenges. How do we grade individual contributors within a collaboration? What belongs to the human, and what to the technology? This collection offers new approaches to assessing human-AI collaboration in educational settings.

Updated November 2025
Kiera Allison headshot
Assistant Professor
McIntire School of Commerce
View Bio
01

Do Something Impossible with AI (then Evaluate the Collaboration)

Kiera Allison

Kiera Allison is an assistant professor of commerce and 2024-2025 Faculty AI Guide at UVA. In this assignment for a management communication course, she asks students to take on a persuasive task that feels impossible and explore how AI might help them accomplish that task.

Headshot of Kiera Allison
Kiera Allison

My assignment asks students to develop and evaluate their own ways of working with AI, something that require a little scaffolding via earlier AI assignments. I also have students share with each other (through their presentations) how they worked with AI, turning the course into more of a learning community.

Was this resource helpful?
02

Postplagiarism: transdisciplinary ethics and integrity in the age of artificial intelligence and neurotechnology - International Journal for Educational Integrity

International Journal for Educational Integrity

What does academic integrity look like in a hybrid human-AI world? Rather than calling for stricter regulation or monitoring of AI, Sarah Eaton offers a provocative alternative: assume that work will be jointly produced by humans and technology, and reimagine what responsible, ethical writing looks like in this new reality.

Headshot of Kiera Allison
Kiera Allison

This piece offers a vocabulary and concrete examples to help us imagine what integrated human-AI work can look like in higher education. While she invites instructors to move beyond increasingly untenable rules and conventions (e.g. originality, plagiarism, cheating detection), Eaton offers a robust alternative vision of what writing and creation can become in an AI-assisted landscape.

View excerpt

Hybrid human-AI writing will become normal

The first principle of postplagiarism is that hybrid writing co-created by human and artificial intelligence is becoming prevalent and will soon become the norm. Text generated by artificial intelligence tools is not static. It can be edited, revised, reworked, and remixed. The result can be a product that is neither fully written by a human, nor by an AI, but one that is hybrid. Trying to determine where the human ends and where the artificial intelligence begins is pointless.

As AI tools become increasingly sophisticated the probability of accurately detecting whether the text was written by a human or an artificial intelligence diminishes (Elhatat et al. 2023). In August 2023, OpenAI, the company behind ChatGPT definitively declared that text generated by artificial intelligence applications cannot be detected (OpenAI 2023). This comes on the heels of numerous news stories about students being falsely accused of academic misconduct after teachers had used so-called AI text-generation detection tools on students’ academic work (e.g., Fowler 2023; Jimenez 2023; Verma 2023).

There are strong signals that AI capabilities will soon be integrated into technologies we use every day such as Microsoft Office or Google Workspace. Already some social media platforms offer an option to users to have AI help write their posts. We only need to pay attention to what is happening around us to see that AI capabilities for text- and non-text-based applications will soon be part of every technology we use.

Was this resource helpful?
03

ChatBench: From Static Benchmarks to Human-AI Evaluation

Serina Chang, Ashton Anderson, Jake Hofman

Chang offers an IT angle on measuring human-AI collaboration, arguing that where LLMs are typically evaluated on autonomous tasks, the better question is how they perform in the hands of human users. Her study reveals how humans elevate or derail AI performance depending on the task, offering practical methods for analyzing the collaboration.

Headshot of Kiera Allison
Kiera Allison

Chang's research captures a core assessment challenge, demonstrating how the impulse to separate technology from user is both pervasive and produces flawed performance measures. Offering specific examples of how humans and LLMs interact to produce different quality results, this piece models the critical shift from asking "what can human or AI do alone?" to "how can humans and AI perform together?"

View excerpt

With the rapid adoption of LLM-based chatbots, there is a pressing need to evaluate what humans and LLMs can achieve together. However, standard benchmarks, such as MMLU, measure LLM capabilities in isolation (i.e., "AI-alone"). Here, we design and conduct a user study to convert MMLU questions into user-AI conversations, by seeding the user with the question and having them carry out a conversation with the LLM to answer their question. We release ChatBench, a new dataset with AI-alone, user-alone, and user-AI data for 396 questions and two LLMs, including 144K answers and 7,336 user-AI conversations. We find that AI-alone accuracy fails to predict user-AI accuracy, with significant differences across multiple subjects (math, physics, and moral reasoning), and we analyze the user-AI conversations to provide insight into how they diverge from AI-alone benchmarks. Finally, we show that fine-tuning a user simulator on a subset of ChatBench improves its ability to estimate user-AI accuracies, increasing correlation on held-out questions by more than 20 points, creating possibilities for scaling interactive evaluation.

Was this resource helpful?
04

Evaluating humans in the age of AI: A backward-design approach to assessing student-AI collaborative work

Kiera Allison

In addition to collaborating with AI and developing rubrics, our students were invited to participate in a pre/post survey about their experience. In this presentation of our study findings, we explore key student insights about what they think deserves to be graded when humans and AI collaborate.

Headshot of Kiera Allison
Kiera Allison

This presentation foregrounds student voices on AI collaboration and assessment, revealing fundamental tensions between evaluating individual contributions versus the collaboration itself—and demonstrating how assessment criteria must evolve regardless of which approach instructors choose.

View excerpt

Mega-Sotl Investigation of Artificial Intelligence in Teaching and Learning: Student Perspectives

Authors: Kiera Allison, Charlotte Hoopes, Gianluca Guadagni, Katya Koubek, Breana Bayraktar, Dayna Henry, Jess Taggart

  1. How can we effectively measure and assess the cognitive work of students within their AI collaborations? 
  2. How do students perceive their role and contribution within ai-assisted projects?

Was this resource helpful?

Want to recommend a resource to add to this collection? Send us an email.