Three New arXiv Papers Address Data Efficiency and Training Methods in AI

Three recent papers on arXiv address different approaches to improving AI model training and efficiency.

Language Self-Play For Data-Free Training

According to arXiv:2509.07414v3, researchers are exploring methods to address what they describe as “a fundamental bottleneck: the need for ever more data” in large language model development. The paper investigates language self-play as an approach for data-free training, though the abstract does not detail specific results or methodologies.

Federated Learning Without Data Storage

ArXiv:2509.25977v2 examines “Data-Free Continual Learning of Server Models in Model-Heterogeneous Cloud-Device Collaboration.” The paper focuses on cloud-device collaborative computing environments where federated learning has become “a key enabler,” according to the abstract. The research addresses scenarios where centralized cloud resources work with distributed edge devices.

Improving Discrete Diffusion Models

In arXiv:2506.10892v3, researchers investigate “The Diffusion Duality,” focusing on uniform-state discrete diffusion models. According to the abstract, these models “hold the promise of fast text generation due to their inherent ability to self-correct,” though they note such models are “typically outperformed by autoregressive models and masked diffusion models.”

All three papers represent replacement versions of earlier submissions on arXiv.