Three New arXiv Papers Address Code LLMs, Reasoning Efficiency, and Long-Form Audio Processing

Three new research papers have been published on arXiv addressing different challenges in AI systems.

CodeScaler (arXiv:2602.17684v1) introduces execution-free reward models for scaling code large language models. According to the abstract, the paper addresses limitations in Reinforcement Learning from Verifiable Rewards (RLVR), which has “driven recent progress in code large language models by leveraging execution-based feedback from unit tests,” but faces scalability constraints due to “availability and reliability” issues with execution-based testing.

Thinking by Subtraction (arXiv:2602.18232v1) presents a confidence-driven contrastive decoding approach for LLM reasoning. The paper challenges assumptions in test-time scaling, noting that “recent work on test-time scaling for large language model (LLM) reasoning typically assumes that allocating more inference-time computation uniformly improves correctness.” However, according to the abstract, “prior studies show that reasoning uncertainty is highly localized.”

LongAudio-RAG (arXiv:2602.14612v2) tackles question answering over multi-hour audio recordings. The abstract states that “long-duration audio is increasingly common in industrial and consumer settings, yet reviewing multi-hour recordings is impractical,” motivating systems that can “answer natural-language queries with precise temporal grounding and minimal hallucination.”

All three papers are cross-posted announcements to the cs.AI (Artificial Intelligence) category on arXiv.