A new position paper published on arXiv argues that large language model (LLM) reasoning should be studied as “latent-state trajectory formation” rather than as faithful surface chain-of-thought (CoT), according to research titled “LLM Reasoning Is Latent, Not the Chain of Thought” (arXiv:2604.15726).
According to the paper, this distinction matters because “claims about faithfulness, interpretability, reasoning benchmarks, and inference-time intervention all depend on what the field takes the primary object of reasoning to be.” The researchers formalize three competing hypotheses: H1 (reasoning is primarily mediated by latent-state trajectories), H2 (reasoning is primarily mediated by explicit surface CoT), and H0 (most apparent reasoning gains are better explained by generic serial compute).
The paper states that after “reorganizing recent empirical, mechanistic, and survey work” and adding “compute-audited worked exemplars,” current evidence “most strongly supports H1 as a default working hypothesis.” The researchers recommend that “the field should treat latent-state dynamics as the default object of study for LLM reasoning.”
Related research on output diversity published simultaneously (arXiv:2604.16027) found that “diversity collapse is determined during training by data composition and cannot be addressed at inference time alone,” examining models including Think (chain-of-thought distillation) and Instruct lineages across 15 tasks.