According to a paper published on arxiv.org, researchers have introduced OLIVIA (Online Learning via Inference-time Action Adaptation), a framework designed to improve decision-making in ReAct-style large language model agents during deployment.
The paper explains that LLM agents that interleave reasoning, action selection, and observation face challenges in deployed settings where “small action-selection errors can accumulate into wasted tool calls, latency, and reduced reliability.” According to the researchers, existing inference-time adaptation methods “mainly rely on prompting or retrieval, which influence behavior indirectly through context manipulation” and “do not expose an explicit decision layer that can score candidate actions, represent uncertainty, or be updated online from action-level feedback.”
OLIVIA addresses these limitations by modeling “the LLM’s final action-selection layer as a contextual linear bandit over candidate actions, with frozen hidden states as decision contexts,” according to the paper. The framework uses upper-confidence-bound exploration to improve policy sample-efficiently with “minimal computational overhead.”
The researchers tested OLIVIA on four benchmarks, reporting that it “consistently improves task performance over static ReAct and prompt-based inference-time baselines.” According to the paper, the results suggest that “explicit online decision layers provide an effective alternative to purely prompt- or retrieval-based adaptation for LLM agents during deployment.”