IntentScore Improves Computer-Use Agent Performance Through Intent-Aware Action Evaluation

New reward model learns to score AI agent actions by embedding planning intent, achieving 97.5% discrimination accuracy and improving task success rates.

According to arxiv.org, researchers have developed IntentScore, a plan-aware reward model designed to evaluate the quality of actions taken by Computer-Use Agents (CUAs)—AI systems that leverage large language models to execute GUI operations on desktop environments.

The research paper states that current CUAs “generate actions without evaluating action quality, leading to irreversible errors that cascade through subsequent steps.” IntentScore addresses this problem by learning to score candidate actions from 398,000 offline GUI interaction steps spanning three operating systems.

According to the paper, IntentScore trains using two complementary objectives: contrastive alignment for state-action relevance and margin ranking for action correctness. A key architectural innovation is that the model “embeds each candidate’s planning intent in the action encoder, enabling discrimination between candidates with similar actions but different rationales.”

The results show that IntentScore achieves 97.5% pairwise discrimination accuracy on held-out evaluation data. When deployed as a re-ranker for Agent S3 on OSWorld—an environment entirely unseen during training—IntentScore improved task success rate by 6.9 percentage points, according to arxiv.org. The paper concludes this demonstrates “that reward estimation learned from heterogeneous offline trajectories generalizes to unseen agents and task distributions.”

The research was published on arxiv.org on May 25, 2026.