OLLM Framework Introduces Options-Based Approach to Large Language Model Training

According to arxiv.org, researchers have introduced Options LLM (OLLM), a method that replaces standard large language models’ single next-token prediction with “a set of learned options” for the next token, indexed by a discrete latent variable.

The framework functions as a lightweight “plug-in” that inserts two layers—an encoder and decoder—before the output head, according to the research paper. This allows “almost any pretrained LLM to be converted with minimal additional parameters.”

In testing, OLLM was applied to a 1.7B-parameter backbone with only 1.56% of parameters trainable, trained on OpenMathReasoning and evaluated on OmniMath. According to arxiv.org, “SOTA LoRA-adapted baselines peak at 51% final answer correctness, while OLLM’s option set allows up to ~70% under optimal latent selection.”

The researchers trained a compact policy in the latent space to control generation. According to the paper, “Operating in a low-dimensional option space makes reward optimization far more sample-efficient and substantially reduces common misalignments (e.g., language switching or degenerate reasoning).”

The paper states that OLLM’s alignment “arises from model structure rather than additional KL or handcrafted alignment losses,” suggesting a structural approach to model control. The researchers conclude that “optionized next-token modeling enhances controllability, robustness, and efficiency in math reasoning.”