Improving Generative AI Student Feedback: Direct Preference Optimization with Teachers in the Loop

Authors
Juliette Woodrow, Sanmi Koyejo, Chris Piech
Stanford University

We present a system for improving LLM-generated student feedback through Direct Preference Optimization (DPO), using real-time preferences from teachers during grading. The system was deployed in two offerings of a Stanford University course on probability and evaluated both through expert blind review and automated critic models.

Read the paper: Improving Generative AI Student Feedback: Direct Preference Optimization with Teachers in the Loop (Woodrow et al., 2025)
View the repo: GitHub repo
Learn more about DPO: Direct Preference Optimization (Rafailov et al., 2023)

What’s in This Repo

Model Code

Training Setup

Inference Setup

Evaluation Setup

Custom Critic Model

Overall