Research shows that even the best AI models 'go bankrupt' when it comes to predicting Premier League results.

Despite dominating programming tests and solving math problems, the world's most advanced AI systems from Google, OpenAI, and xAI all falter when faced with the unpredictable nature of the football in the Premier League.

A new study called KellyBench from the London-based startup General Reasoning has revealed an interesting fact: Artificial intelligence is still unable to overcome the "chaos" of the real world. In a simulated test of the entire 2023-24 Premier League season, top AI models revealed significant shortcomings in long-term reasoning and risk management.

images 1 of Research shows that even the best AI models 'go bankrupt' when it comes to predicting Premier League results.

A "master" programmer but a "novice" when it comes to gambling.

The study placed eight AI models in an environment disconnected from the internet, provided them with detailed historical data, and asked them to build a profitable betting strategy. The results were surprising: most of these electronic "brains" ended the season in a loss, or even bankruptcy.

  • Claude Opus 4.6 (Anthropic): The most stable performer, but still experiencing an average loss of 11%.

  • Grok 4.20 (xAI): A major disappointment, failing on its very first attempt and unable to complete subsequent tests.

  • Gemini 3.1 Pro (Google): A rare exception that created a bright spot with one attempt achieving a 34% profit, despite suffering financial failure in another attempt.

Overall, the performance of this super AI rig still falls far short of that of professional gamblers – who rely on intuition and practical experience.

images 2 of Research shows that even the best AI models 'go bankrupt' when it comes to predicting Premier League results.

 

The gap between the lab and real life.

Ross Taylor, CEO of General Reasoning and former expert at Meta AI, argues that these results reflect an overhype surrounding AI automation. He believes current evaluation standards focus too much on "static environments" (such as writing code or summarizing text) while neglecting the variability and context-dependent nature of real-world scenarios.

"If you apply AI to real-world tasks with long-term vision and constantly changing variables, the results will be terrible," Taylor told the Financial Times.

Lessons on practical reasoning skills

The KellyBench experiment demonstrates that the ability to create software or solve structured problems does not equate to AI's ability to understand the uncertain feedback loops of society.

Although developers are striving to bridge the gap between digital intelligence and real-world reasoning, variables such as player performance, injuries, or moments of brilliance on the field remain a "difficult problem" that no algorithm has yet been able to fully solve.

Update 15 April 2026

David Pac

David Pac is a senior IT professional who designs the overall technical vision and structure of a project, transforming business requirements into viable software/system solutions.

Related Stories