Three New Studies Explore Test-Time Computation Scaling in AI Models

Three new arXiv papers explore different approaches to scaling test-time computation in AI systems.

According to arXiv:2602.03975v1, researchers have developed methods for adaptive test-time compute allocation in large language models (LLMs). The paper addresses how “a large fraction of verifier calls are spent on redundant” operations, proposing learned heuristics over categorical structure to optimize reasoning efficiency. The authors note that test-time computation “has become a primary driver of progress in large language model reasoning” but is “increasingly bottlenecked by expensive verification.”

A second paper (arXiv:2602.04344v1) introduces UnMaskFork, which applies test-time scaling to Masked Diffusion Language Models (MDLMs). The research demonstrates that MDLMs “are inherently amenable” to test-time scaling strategies through deterministic action branching, extending techniques previously used primarily with autoregressive models.

The third study (arXiv:2602.03873v1) tackles emotion recognition in audio-language models. According to the paper, while “most prior work frames emotion recognition as a categorical classification problem, real-world affective states are often ambiguous.” The researchers apply test-time scaling to better handle this ambiguity in speech-based emotion recognition systems.

All three papers represent new submissions to arXiv across different AI subfields.