New Research Addresses Reward Modeling and Grading Challenges in LLM Fine-Tuning

Three recent papers on arXiv explore critical challenges in large language model (LLM) training and evaluation.

According to arXiv:2509.21500v3, “Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training,” reinforcement fine-tuning (RFT) frequently encounters reward over-optimization problems. The paper states that “a policy model hacks the reward signals to achieve high scores while producing low-quality outputs,” with theoretical analysis pointing to underlying issues in reward mechanisms.

Two companion papers address automated grading systems using LLMs. arXiv:2603.00451v1, “Confusion-Aware Rubric Optimization for LLM-based Automated Grading,” notes that “accurate and unambiguous guidelines are critical for large language model (LLM) based graders, yet manually crafting these prompts is often sub-optimal as LLMs can misinterpret expert guidelines or lack necessary domain specificity.”

Meanwhile, arXiv:2603.00465v1, “Optimizing In-Context Demonstrations for LLM-based Automated Grading,” focuses on using in-context learning (ICL) for grading open-ended student responses. The paper highlights that “automated assessment of open-ended student responses is a critical capability for scaling personalized feedback in education,” though it notes reliability concerns with current LLM-based approaches.