Three new preprints on arXiv address different aspects of making large language model (LLM) training and reasoning more efficient.
Training Data Selection with Gradient Orthogonality (arXiv:2602.06359v1) proposes a method for selecting training data to help LLMs adapt to specialized domains. According to the abstract, fine-tuning LLMs for specific domains “often necessitates a trade-off between acquiring domain expertise and retaining general reasoning capabilities, a phenomenon known as catastrophic forgetting.”
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation (arXiv:2505.18759v2) introduces a benchmark for evaluating chain-of-thought (CoT) distillation techniques. The paper notes that “data-centric distillation, including data augmentation, selection, and mixing, offers a promising path to creating smaller, more efficient student Large Language Models (LLMs) that retain strong reasoning abilities,” but indicates there currently “lacks a com[prehensive benchmark]” in this area.
Generating Data-Driven Reasoning Rubrics for Domain-Adaptive Reward Modeling (arXiv:2602.06795v1) addresses verification challenges in LLM reasoning outputs. According to the abstract, “LLMs struggle to reliably identify errors in thinking traces, particularly in long outputs, domains requiring expert knowledge, and problems without verifia[ble solutions].”
All three papers represent ongoing research efforts to improve LLM efficiency and reliability across specialized applications.