SkillOpt and CoSPlay: New Approaches to Self-Improving AI Agent Capabilities

According to arxiv.org, researchers have published two new frameworks addressing how AI agents can improve their capabilities through automated optimization.

SkillOpt introduces what its authors describe as “the first systematic controllable text-space optimizer for agent skills.” According to the paper published May 25, 2026, the system treats agent skills as trainable external state rather than hand-crafted code. The framework uses a separate optimizer model that converts scored rollouts into “bounded add/delete/replace edits on a single skill document,” accepting edits only when they strictly improve performance. The researchers argue this approach applies the same discipline used in weight-space optimization to skill development.

CoSPlay (Cooperative Self-Play), also published May 25, 2026 on arxiv.org, addresses code generation without requiring Ground-Truth Unit Tests (GT UTs). The paper notes that existing approaches either require costly training with GT UTs or “lose competitiveness without them.” CoSPlay’s framework jointly improves both code and unit tests through what the authors call “cooperative self-play,” using bidirectional execution signals to iteratively refine both components. According to the paper, CoSPlay applied to Qwen2.5-7B-Instruct improved average Best-of-N results from 22.1% to 33.2% and unit test accuracy from 14.6% to 78.3% across four benchmarks, matching or surpassing the RLVR model CURE-7B. Code and data are available on GitHub and HuggingFace.