Three New Papers Address Key Challenges in Large Language Model Context and Agent Training

Three new papers on arXiv address fundamental challenges in large language model (LLM) capabilities and training.

Doc-to-LoRA (arXiv:2602.15902v1) tackles the computational bottleneck of processing long input sequences. According to the abstract, “long input sequences are central to in-context learning, document understanding, and multi-step reasoning of Large Language Models,” but “the quadratic attention cost of Transformers makes inference memory-intensive and slow.” The paper proposes a method to “instantly internalize contexts,” though the abstract excerpt does not detail the specific approach.

Proxy State-Based Evaluation (arXiv:2602.16246v1) addresses evaluation challenges for multi-turn tool-calling LLM agents. The paper notes that “interactive large language model (LLM) agents operating via multi-turn dialogue and multi-step tool calling are increasingly used in production,” and proposes benchmarks that “both reliably compare models and yield on-policy training data.”

HiPER (arXiv:2602.16165v1) introduces a hierarchical reinforcement learning approach for LLM agents. According to the abstract, the work focuses on “multi-turn decision-making” which “remains challenging, particularly in long-horizon tasks with sparse and delayed rewards, where agents must execute extended sequences of actions before receiving meaningful feedback.”

All three papers represent efforts to improve LLM performance in extended, multi-step scenarios.