Three New arXiv Papers Examine LLM Routing, Training, and Safety Concerns

Recent preprints address router evaluation in collaborative LLM systems, on-policy training methods, and alignment risks in self-evolving agents.

Three New arXiv Papers Examine LLM Routing, Training, and Safety Concerns

Three recent preprints on arXiv address different aspects of large language model development and deployment.

Router Evaluation for Collaborative LLM Systems

According to arXiv:2602.11877v1, researchers are working toward “fair and comprehensive evaluation of routers in collaborative LLM systems.” The paper notes that while LLMs have achieved success, “cost and privacy constraints necessitate deploying smaller models locally while offloading complex queries to cloud-based models.” The authors argue that existing router evaluations are “unsystematic, overlooking scenar[ios]” (the abstract appears truncated).

On-Policy Training Methods

A second paper (arXiv:2602.12222v1) proposes bridging the gap between supervised fine-tuning (SFT) and reinforcement learning approaches. According to the abstract, “supervised fine-tuning (SFT) is computationally efficient but often yields inferior generalization compared to reinforcement learning (RL).” The researchers attribute this gap “primarily” to “RL’s use of on-policy data” and propose a framework to address this difference.

Alignment Risks in Self-Evolving Agents

The third paper (arXiv:2510.04860v2) identifies what researchers call the “Alignment Tipping Process.” According to the abstract, as “LLM agents increasingly gain self-evolutionary capabilities to adapt and refine their strategies through real-world interaction, their long-term reliability becomes a critical concern.”