Three separate research teams have published agent frameworks designed to improve reliability and automation in specialized domains, according to papers posted on arxiv.org on May 18, 2026.
CAX-Agent addresses reliability challenges in MAPDL finite-element simulation by introducing an “agent harness” with structured execution control and fault recovery, according to arxiv.org. The system organizes execution into three layers—LLM service, agent harness, and solver backend—with a recovery ladder that escalates from deterministic rule patching through model-driven regeneration to human intervention. In testing on 50 structural benchmarks with 450 total case-runs, the model-driven recovery strategy achieved a 0.9267 completion rate and 3.59/4 task score, outperforming rule-only and no-recovery approaches with “large effect sizes,” the paper states.
ColPackAgent provides an agent framework for Monte Carlo simulations of colloidal packing, according to arxiv.org. The system uses a Model Context Protocol (MCP) tool server wrapping HOOMD-blue hard-particle Monte Carlo and encodes a “four-stage workflow contract.” The researchers note that “without dedicated simulation tools and workflow instructions, general-purpose Large Language Model (LLM) agents tend to describe such workflows rather than execute them reliably.”
FORGE (Failure-Optimized Reflective Graduation and Evolution) enables LLM agents to improve decision-making through self-generated memory without gradient updates, according to arxiv.org. Tested on the CybORG CAGE-2 network-defense benchmark, FORGE improved average evaluation returns by 1.7-7.7× over zero-shot baselines and 29-72% over Reflexion baselines across 12 model-representation conditions, the paper reports.