According to arxiv.org, a comprehensive survey titled “LLMOrbit” examines large language models spanning 2019-2025, identifying three critical challenges facing the field: data scarcity with 9-27 trillion tokens projected to be depleted by 2026-2028, exponential cost growth from $3 million to over $300 million in five years, and a 22-fold increase in energy consumption.
The survey, which analyzed over 50 models across 15 organizations, documents six paradigms addressing these “scaling wall” limitations, according to arxiv.org. These include test-time compute approaches where models like o1 and DeepSeek-R1 achieve GPT-4 performance with 10x inference compute, quantization offering 4-8x compression, and distributed edge computing providing 10x cost reduction. The research notes that DeepSeek-R1 achieved 79.8% on the MATH benchmark, and that open-source Llama 3’s 88.6% MMLU score surpassed GPT-4’s 86.4%.
Separately, arxiv.org published research on “SPaCe,” a self-paced learning framework that achieves comparable or better accuracy than state-of-the-art baselines while using up to 100x fewer training samples through cluster-based data reduction and adaptive sample allocation.
Another arxiv.org paper accepted by ACL 2026 examines reasoning failure dynamics in LLMs, finding that errors often originate from “a small number of early transition points” coinciding with localized spikes in token-level entropy, and introduces the GUARD framework to address these critical transitions.