New Training Method Improves Long-Context Reasoning in Large Language Models

Researchers have developed ProxyCoT, a novel training framework designed to improve reasoning capabilities in large language models processing long contexts, according to a paper accepted to ACL 2026 and published on arxiv.org.

According to the research, recent large language models support inputs of up to 10 million tokens, yet “perform poorly on long-context tasks that require complex reasoning.” The paper notes that such tasks can be solved using only a subset of the input—a “proxy context”—rather than the full sequence, yet models exhibit “a significant performance disparity between proxy and full contexts.”

ProxyCoT addresses this by first obtaining “high-quality chain-of-thought reasoning traces on proxy contexts through reinforcement learning or distillation from a larger teacher model,” then grounding these traces in full long contexts with supervised fine-tuning, according to arxiv.org. Experiments across different datasets demonstrate that ProxyCoT “consistently outperforms strong baselines with reduced computational overhead,” and models trained with this approach “generalize their long-context reasoning capabilities to out-of-domain tasks.”

A separate study on arxiv.org examining positional failures in long-context models found that mainstream reasoning benchmarks “do not control positional placement of target tasks in long contexts,” suggesting the importance of controlled evaluation frameworks for assessing long-context performance.