Oversight in Action: Experiences with Instructor-Moderated LLM Responses in an Online Discussion Forum
Using AI to draft replies to discussion posts saves instructors time
Qiao, S., Denny, P., & Giacaman, N. (2024). Oversight in Action: Experiences with Instructor-Moderated LLM Responses in an Online Discussion Forum (No. arXiv:2412.09048). arXiv. https://doi.org/10.48550/arXiv.2412.09048
AI turns out to be particularly well-suited to drafting replies to messages in asynchronous channels like discussion boards because the async nature of discussion boards (or email, etc.) allows time for instructors to review and edit the drafts before they get posted.
Online class discussion forums, which are widely used in computing education, present an opportunity for exploring instructor oversight because they operate asynchronously. Unlike real-time interactions, the discussion forum format aligns with the expectation that responses may take time, making oversight not only feasible but also pedagogically appropriate. In this practitioner paper, we present the design, deployment, and evaluation of a `bot' module that is controlled by the instructor, and integrated into an online discussion forum. The bot assists the instructor by generating draft responses to student questions, which are reviewed, modified, and approved before release. Key features include the ability to leverage course materials, access archived discussions, and publish responses anonymously to encourage open participation. We report our experiences using this tool in a 12-week second-year software engineering course on object-oriented programming. Instructor feedback confirmed the tool successfully alleviated workload but highlighted a need for improvement in handling complex, context-dependent queries.
Also, when instructors did edit the messages, they deleted far more text than they added.
Most of the edited answers involved removing content from AIDA’s drafted answer, with additions being less common. The high proportion of removals for most posts may indicate that the answers the LLM generated contained redundant content, content the instructor felt was inappropriate, or overly detailed. With fewer additions needed overall, it suggests that the instructor felt that the draft answers generally contained the necessary information and only required minor modifications. Of the 95 adopted generated answers, over half required fewer than 10 edits (additions and removals combined), with very few requiring a large number of edits.