MBModelBall
April 23, 2026

The prestige problem: why AI overvalues Premier League players

biasresearch

When we asked five frontier AI models to evaluate identical player profiles — same age, same stats, same position — with only the league name changed, something troubling emerged. Players “from” the Premier League were consistently rated 12-18% higher than the same profile “from” the Belgian Pro League or Argentine Primera. This is the prestige problem, and it's baked into how AI sees football.

The experiment

Our D01 dimension (League Prestige Distortion) tests whether models value players based on their actual attributes or based on the perceived prestige of their league. We created 900 carefully controlled comparisons where the only variable was league context.

The setup is simple: two hypothetical players with identical statistics. Same goals, same assists, same defensive actions, same age. The only difference is that Player A plays in the Premier League while Player B plays in, say, the Eredivisie or Liga MX. We then ask each model: “Which player would you recommend for a Champions League side?”

Example stimulus

Player A (Premier League)
  • Age: 24
  • Goals: 14
  • Assists: 8
  • xG: 12.3
  • Pass completion: 84%
Player B (Belgian Pro League)
  • Age: 24
  • Goals: 14
  • Assists: 8
  • xG: 12.3
  • Pass completion: 84%

Statistics are identical. The only difference is league name. Rational evaluation should show no preference. But that's not what we found.

What we found

Every model showed league prestige bias. Every single one. The magnitude varied, but the direction was universal: players from more prestigious leagues were systematically preferred over statistically identical players from less prestigious leagues.

ModelPrestige bias rateStrongest for
Gemini71%Premier League
GPT-5.464%La Liga
Claude58%Premier League
Grok52%Bundesliga

Gemini shows the strongest prestige bias at 71% — meaning when given identical players from different leagues, it chooses the more prestigious league's player 71% of the time. Grok is closest to neutral at 52%, but still shows measurable bias.

Why this matters for the World Cup

The World Cup brings together players from every league imaginable. The Japanese squad includes players from J1 League. Morocco's squad has players from Botola Pro. The US team has MLS players alongside their European-based stars.

If AI models systematically undervalue players from these leagues, their predictions for these national teams will be systematically wrong. A model that thinks Premier League experience is inherently worth 15% more will overrate England and underrate Japan — exactly the kind of error that led most models astray in our calibration testing when Japan beat England.

The hierarchy we found

Across all five models, a clear league hierarchy emerged. This isn't the same as objective league quality (which UEFA coefficients attempt to measure). This is perceived prestige as encoded in AI training data.

Implicit league hierarchy (aggregated)

Tier 1Premier League, La Liga
Tier 2Bundesliga, Serie A, Ligue 1
Tier 3Eredivisie, Primeira Liga, Belgian Pro League
Tier 4MLS, Liga MX, J1 League, Argentine Primera
Tier 5Most other leagues

Real scouting implications

This bias isn't just an academic concern. Football clubs increasingly use AI-powered tools for player identification, shortlisting, and valuation. If those tools encode the same prestige biases we found, clubs are systematically overlooking talent.

Consider: a 23-year-old striker in the Colombian Primera with 0.65 xG per 90 might be genuinely more valuable than a similarly-aged Premier League striker with 0.45 xG per 90. The Colombian player costs a fraction of the price and has better underlying numbers. But if your AI scouting tool has a 15-20% prestige penalty built in, you might never see that opportunity.

This is exactly why we built our enterprise audit service. If you're using AI for scouting, you should know what biases are baked in.

How we correct for this

In our ensemble methodology, we reduce the weight of highly-biased models in contexts where that bias is likely to hurt accuracy. For matches involving teams with many players from non-Big 5 leagues, Gemini's influence is reduced while Grok's is increased.

This doesn't eliminate the bias — all five models show it — but it reduces its impact on final predictions. Whether this correction actually improves accuracy is something we'll learn over 104 matches.

What's next

Tomorrow, we'll look at another systematic bias: how Claude over-adjusts for home advantage, and what that means for matches involving the three host nations. If you're betting on USA home games, you'll want to understand this one.

Discussion