New Research Advances Inference-Time Scaling Across Image Generation, Code Verification, and Reasoning Models

Three recent papers on arxiv.org showcase different approaches to inference-time scaling across diverse AI domains.

According to arxiv.org, researchers presented a training-free framework for compositional image synthesis that addresses text-to-image models’ struggles with object counts, attributes, and spatial relations. The framework leverages large language models to synthesize explicit layouts from prompts and uses an object-centric vision-language model judge to rerank multiple candidates iteratively, “achieving stronger scene alignment with prompts compared to recent text-to-image models.”

In the domain of formal verification, arxiv.org reports that researchers developed Goedel-Code-Prover-8B, a hierarchical proof search framework for automated code verification in Lean 4. The system decomposes complex verification goals into simpler subgoals before attempting tactic-level proving. On three Lean-based benchmarks comprising 427 tasks, the 8B-parameter model achieved a 62.0% prove success rate, representing “a 2.6× improvement over the strongest baseline, surpassing neural provers up to 84× larger.” The researchers observed that “success rates improve monotonically with search iterations and sampling budget.”

According to arxiv.org, NVIDIA researchers introduced Nemotron-Cascade using cascaded domain-wise reinforcement learning. The approach addresses heterogeneity challenges in building general-purpose reasoning models, including “large variation in inference-time response lengths and verification latency.” The system can operate in both instruct and deep thinking modes “without any performance gap relative to a thinking-only counterpart.”