Three separate research papers published on arXiv address fundamental challenges in large language model reasoning and long-context processing.
According to arxiv.org, researchers introduced a subgoal-driven framework combining online planning with MiRA (Milestoning your Reinforcement Learning Enhanced Agent), an RL training system using milestone-based rewards. The framework improved Gemini’s performance on the WebArena-Lite benchmark by approximately 10% absolute increase in success rate. When applied to the open Gemma3-12B model, MiRA increased success rates from 6.4% to 43.0%, surpassing GPT-4-Turbo (17.6%), GPT-4o (13.9%), and the previous open-model state of the art, WebRL (38.4%).
In a separate study, arxiv.org identified what researchers call the “α-law” governing how instruction-tuned LLMs revise probability assignments. Testing across 4,975 problems on benchmarks including GPQA Diamond and MMLU-Pro, researchers found that GPT-5.2 and Claude Sonnet 4 exhibit “near-Bayesian update behavior” with models operating “slightly above the stability boundary in single-step revisions.”
A third paper from arxiv.org introduces λ-RLM, replacing free-form recursive code generation with a typed functional runtime based in λ-calculus. According to the researchers, λ-RLM outperformed standard Recursive Language Models in 29 of 36 model-task comparisons across nine base models, improving average accuracy by up to 21.9 points and reducing latency by up to 4.1x.