Three New ArXiv Papers Address LLM Evaluation, Recommendation Systems, and Reinforcement Learning

Researchers publish papers on improving LLM judges with tools, user-controlled recommendations, and dynamic resource allocation in RL.

Three New ArXiv Papers Address LLM Evaluation, Recommendation Systems, and Reinforcement Learning

Three new papers on arXiv explore different aspects of AI system development:

Tool-Integrated LLM Judges

According to arXiv:2510.23038v2, researchers are working on “Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning.” The paper notes that while “Large Language Models (LLMs) are widely used as judges to evaluate response quality, providing a scalable alternative to human evaluation,” current LLM judges “operate solely on intrinsic text-based reasoning,” which limits their capabilities.

Promptable Recommendations

A paper titled “Give Users the Wheel: Towards Promptable Recommendation Paradigm” (arXiv:2602.18929v1) addresses limitations in sequential recommendation models. According to the abstract, while “conventional sequential recommendation models have achieved remarkable success in mining implicit behavioral patterns,” these systems “remain structurally blind to explicit user intent” and “struggle to adapt when a user’s immediate goal” changes.

Dynamic Resource Allocation in RL

ArXiv:2602.19208v1 examines “Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization.” The paper states that “Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for Large Language Model (LLM) reasoning,” but identifies challenges including “uniform rollout allocation” issues in current methods.