New OLLM Architecture Achieves 70% Accuracy on Math Reasoning Benchmark

Researchers have introduced Options LLM (OLLM), a novel method that modifies how large language models predict the next token during text generation, according to a paper published on arxiv.org on April 22, 2026.

According to the paper, OLLM replaces standard single next-token prediction with “a set of learned options for the next token, indexed by a discrete latent variable.” The architecture functions as a “lightweight ‘plug-in’ that inserts two layers: an encoder and a decoder, before the output head, allowing almost any pretrained LLM to be converted with minimal additional parameters.”

The researchers applied OLLM to a 1.7B-parameter model with only 1.56% of parameters trainable, training it on OpenMathReasoning and evaluating on OmniMath. According to the paper, “The SOTA LoRA-adapted baselines peak at 51% final answer correctness, while OLLM’s option set allows up to ~70% under optimal latent selection.”

The method also includes training “a compact policy in the latent space that emits latents to control generation.” According to the researchers, operating in this low-dimensional option space “makes reward optimization far more sample-efficient and substantially reduces common misalignments (e.g., language switching or degenerate reasoning).” The paper notes that “this alignment arises from model structure rather than additional KL or handcrafted alignment losses.”