Three New ArXiv Papers Advance LLM Fine-Tuning and Reasoning Capabilities

New Research Addresses LLM Optimization Challenges

Three recent papers on arXiv explore different approaches to improving large language model capabilities:

Parameter-Efficient Fine-Tuning with Mixture of Space Experts

According to arXiv paper 2602.14490v1, researchers propose a new Parameter-Efficient Fine-Tuning (PEFT) method that addresses limitations of existing approaches. The paper notes that “existing PEFT methods mainly operate in Euclidean space,” suggesting their method explores alternative mathematical frameworks for fine-tuning LLMs on downstream tasks.

On-Policy Supervised Fine-Tuning for Reasoning

ArXiv paper 2602.13407v1 examines Large Reasoning Models (LRMs), which the authors state “are commonly trained with reinforcement learning (RL) to explore long chain-of-thought reasoning, achieving strong performance at high computational cost.” The research indicates that “recent methods add multi-reward objectives to jointly optimize” these models, aiming to improve reasoning efficiency.

REDSearcher Framework for Search Agents

The third paper (arXiv 2602.14234v1) introduces REDSearcher, described as “a scalable and cost-efficient framework for long-horizon search agents.” According to the abstract, the research addresses how “large language models are transitioning from general-purpose knowledge engines to real-world problem solvers,” with the “central bottleneck” being “the extreme sparsity of high-quality search trajectories” when optimizing for deep search tasks.