Google Expands Game Arena Benchmark with Poker and Werewolf

Google has announced expansions to its Game Arena AI benchmarking platform, according to the Google AI Blog. The platform is adding two new games—Poker and Werewolf—to its existing suite of benchmarks designed to evaluate AI model capabilities.

According to the announcement, Google’s Gemini 3 Pro and Gemini Flash models currently top the chess leaderboard within Game Arena. The platform uses games as a way to assess various AI competencies, including strategic thinking, decision-making under uncertainty, and potentially social deduction skills with the addition of Werewolf.

The expansion represents Google’s ongoing efforts to develop more comprehensive and diverse benchmarking tools for AI systems. By incorporating games that test different cognitive abilities—from the mathematical precision required in chess to the social reasoning needed in Werewolf—Game Arena aims to provide a more multifaceted evaluation of AI model performance.

The inclusion of Poker adds elements of probability assessment and incomplete information handling to the benchmark suite, while Werewolf introduces social deduction and communication elements that differ significantly from traditional board games.