OpenAI’s o3 beats Google, Meta, and Grok in an all-AI poker tournament

In an unusual experiment designed to test real-time reasoning, nine large language models spent five days playing no-limit Texas hold ’em against each other in a fully automated environment.

The competitors included OpenAI’s o3, Claude Sonnet 4.5 from Anthropic, Grok from xAI, Gemini 2.5 Pro from Google, Llama 4 from Meta, DeepSeek R1, Kimi K2 from Moonshot AI, Magistral from Mistral AI, and GLM 4.6 from Z.AI.

Each model started with a $100,000 bankroll and played thousands of hands at $10 and $20 tables. The tournament was run by PokerBattle.ai, with the same initial prompt and rules applied to every participant.

At the end of the week, OpenAI’s o3 model finished with the highest profit, up $36,691.

Table of Contents

How the models performed

OpenAI’s o3 showed the most consistent performance across the tournament. It won three of the five largest pots and stayed close to established pre-flop strategy, avoiding large losses while steadily accumulating chips.

Claude Sonnet 4.5 from Anthropic finished second with $33,641 in profit. xAI’s Grok placed third, ending the tournament up $28,796.

Google’s Gemini 2.5 Pro recorded a modest profit, while several other models struggled to maintain their stacks. Meta’s Llama 4 lost its entire bankroll early in the event. Moonshot AI’s Kimi K2 also performed poorly, finishing down more than $13,000.

The remaining models landed between these extremes, neither collapsing nor standing out.

What poker reveals about AI decision-making

Poker is often used as a benchmark for general reasoning systems because it combines incomplete information, probability, and opponent modeling. Unlike games such as chess, success depends on managing uncertainty and adapting to changing behavior.

Across the tournament, most models showed a tendency toward aggressive play. They favored action over caution, often pursuing large pots rather than minimizing losses. Bluffing attempts were frequent but inconsistent, usually driven by incorrect hand evaluations rather than deliberate deception.

Despite these weaknesses, the top-performing models demonstrated an ability to adjust strategies over time and respond to opponent patterns. Their decisions reflected probabilistic reasoning rather than simple rule following.

Limits still visible

The experiment also highlighted persistent shortcomings. Several models misjudged position, overestimated hand strength, or failed to disengage from losing scenarios. These errors mirror broader challenges seen in real-world AI deployments, where systems can draw confident conclusions from incomplete or misinterpreted inputs.

While the tournament does not suggest AI systems understand poker in a human sense, it does show progress in managing ambiguity and making sequential decisions under pressure.

Kingston’s new IronKey USB drive bets on hardware encryption for safer portable storage

NASA Artemis II launch rescheduled for April 1 after repeated delays

Channel Surfer is the YouTube web app that brings back cable TV channel surfing

AI Deepfake Attacks are the biggest data security risk businesses face right now

The ASUS ExpertCenter PN54 is a tiny AI powerhouse that actually delivers

Nintendo Alarmo Review

Panasonic Z85A OLED TV Review

AMD Ryzen 7 9800X3D Review

George Russell tops Chinese Grand Prix FP1 as Mercedes show early pace at Shanghai

Denza Z9 GT refresh brings a massive range boost to the luxury EV market

Rivian Adventure Department takes vehicle testing to the extreme

Best drones to buy for professional photography

AirTag battery replacement is quick and easy, so here’s how you can do it yourself

How to Run Windows XP on the Steam Deck: The 2026 Restoration Guide

OpenAI’s o3 beats Google, Meta, and Grok in an all-AI poker tournament

How the models performed

What poker reveals about AI decision-making

Limits still visible

The Google+ project A quick look [video]

Convert and watch YouTube videos in 3D.

Anyone still needs a Google+ invite ?

Viral video:Nyan Cat Indian Bollywood Version.

Google 2011 Q2 revenues hits $9 billion

Have you received the Google Plus cheat sheet?

Twitter celebrates 5 years of its existence