New Research Explores Training Methods and Optimization Techniques for Large Language Models

Three recent papers on arXiv examine different aspects of large language model development and optimization.

Adaptive Mathematical Reasoning

According to arXiv paper 2502.12022v4, researchers are exploring adaptive approaches to mathematical reasoning that combine Chain-of-Thought (CoT) methods with Tool-Integrated Reasoning (TIR). The paper notes that existing approaches rely on CoT for generalizability or TIR for precise computation, and efforts are being made to combine these methods.

Long-Context Inference Optimization

A separate paper (arXiv:2506.08373v3) addresses the optimization of inference for long-context LLMs, noting that this is “increasingly important due to the quadratic compute and linear memory cost of Transformers.” The research examines draft-based approximate inference methods and existing approaches including key-value (KV) cache dropping.

Training Stability Challenges

According to arXiv paper 2602.01103v1, prolonged reinforcement learning with verifiable rewards (RLVR) “has been shown to drive continuous improvements in the reasoning capabilities of large language models,” but the paper notes that training is “often prone to instabilities, especially in Mixture-of-Experts” models. The research examines these instabilities through what it calls “objective-level hacking.”