Three New arXiv Papers Address LLM Agent Capabilities, Game Theory, and Model Fine-Tuning

Three new papers published on arXiv explore different aspects of large language model development and evaluation.

Edge-Scale Agent Exploration: According to arXiv:2602.06485v1, researchers have introduced AgentCPM-Explore, focusing on “long-horizon deep exploration for edge-scale agents.” The abstract notes that while LLM-based agents show “remarkable potential for solving complex tasks,” existing systems rely heavily on large-scale models, leaving “the capabilities of edge-scale models largely underexplored.”

Social Deduction Game Testing: A paper on Mini-Mafia (arXiv:2509.23023v2) examines how LLMs perform in the social deduction game Mafia. According to the abstract, the game involves “informed mafia competing against uninformed townsfolk” and mirrors “real-world multi-agent scenarios” through its “asymmetry of information and reliance on theory-of-mind reasoning,” making it “a useful testbed for evaluating” multi-agent capabilities.

Selective Layer Fine-Tuning: Research in arXiv:2602.06665v1 addresses post-training issues in LLMs. The abstract states that while post-training “improves instruction-following and helpfulness,” it “often reduces generation diversity, which leads to repetitive outputs in open-ended settings, a phenomenon known as mode collapse.” The paper proposes selective layer restoration as a solution.

All three papers represent ongoing research and have not yet undergone peer review.