MBModelBall

THE COMPETITORS

Five AIs. Different Biases.
Same Matches.

Each model has been fingerprinted across 45,300 queries.
We know their blind spots. Now we test if that knowledge helps.

Fingerprint comparison

League prestigeClub prestigeDemographicsAge curveTemporal weightTournament pedigreeAttribute typeRole valueRisk toleranceMedia narrativeTactical knowledgeTactical contextFixture difficultyHome advantageUpset IDNarrative overrideOdds integrationForm recencySquad depthKey player absenceStakes & pressurexG integration

Distance from center = bias strength. Each spike shows where that AI over or under-values something.

Every model is given the same evidence — same rankings, same form, same news, same prompt. The differences plotted here are purely artefacts of how each model reasons.

Full blind spot comparison

Bias strength across all dimensions. Red = bigger bias. Hover for details.

DimensionGPT-5.4GPT-5.5ClaudeGrokGemini
D01League prestige discount+1.37+1.38+1.35+1.18+1.41
D02Club prestige halo+0.65+0.20+0.19+0.62+0.04
D03Demographic evaluation consistency-0.11-0.55+0.09+1.17+1.14
D04Age curve encoding+0.75+0.93+1.06+0.87+0.75
D05Temporal weighting-0.61-0.64-0.69-0.61-0.47
D06Tournament pedigree encoding-0.21-0.64-0.47-0.33-0.57
D07Attribute type preference-0.75-0.94-1.14-0.47-0.31
D08Role value encoding-0.58+0.14-0.24-0.00+1.27
D09Risk tolerance in selection-0.70-1.13-0.37-0.88-0.96
D10Media narrative anchoring+0.03-0.05-0.16+0.12-0.00
D11Tactical knowledge index+1.60+1.33+1.73+1.87+3.47
D12Tactical context adjustment-1.30-1.30-1.11-1.44-0.70
PC01Fixture difficulty calibration-0.75-0.54-1.080.00-0.04
PC02Home advantage calibration+0.120.00+0.86-0.99+0.81
PC03Upset identification-1.57-1.57-1.57-1.57-1.57
PC04Team narrative override-1.57-1.57-1.57-0.03-1.57
PC05Odds integration-1.05-1.57-1.57+0.64-1.57
PC06Form recency integration-1.57-1.28-1.37-0.79-1.22
PC09Squad depth & fatigue-1.57-1.33-1.57-1.25-1.07
PC10Key player absence-1.30-1.02-1.09-1.18-1.30
PC11Stakes & pressure calibration-1.57-1.57-1.57-1.57-1.57
PC14Expected goals integration-1.57-1.57-1.57-1.57-1.57
Under-correction
Over-correction
−1.570+1.57

Meet the models

OpenAI

GPT-5.4

The Market Baseline

8.4/10
Tactical knowledge · #2 of 5
Style: Conservative and balanced predictions
Widely used, moderate across dimensions
Follows market consensus too closely
Best for:
  • Standard league matches
  • Fixture difficulty assessment
  • Balanced predictions
Watch for:
  • Host nation matches
  • High-stakes scenarios
  • Form recency weighting
Strongest bias: Tactical knowledge
Calibration issues: 9 severe
View full fingerprint →
OpenAI

GPT-5.5

The Evolved Baseline

8.7/10
Tactical knowledge · #1 of 5
Style: More skeptical of popular narratives
Reduced narrative bias vs GPT-5.4
Still anchors on media sentiment
Best for:
  • Counter-narrative situations
  • Undervalued teams
  • Non-prestige matchups
Watch for:
  • High-profile fixtures
  • Tournament drama
  • Media-hyped scenarios
Strongest bias: Upset ID
Calibration issues: 8 severe
View full fingerprint →
Anthropic

Claude Sonnet 4.6

The Analyst

8.3/10
Tactical knowledge · #3 of 5
Style: Thoughtful but over-corrects for context
Best tactical reasoning, detailed analysis
Over-adjusts home advantage by 15-20%
Best for:
  • Tactical analysis
  • Key player absence impact
  • xG-based predictions
Watch for:
  • Home advantage calibration
  • Host nation matches
  • Prestige matchups
Strongest bias: Tactical knowledge
Calibration issues: 10 severe
View full fingerprint →
xAI

Grok 3

The Contrarian

8.1/10
Tactical knowledge · #4 of 5
Style: Data-driven, sometimes too mechanical
Strong odds integration, market-aware
Ignores home crowd effect entirely
Best for:
  • Odds divergence scenarios
  • Market-informed predictions
  • Upset identification
Watch for:
  • Home advantage scenarios
  • Tournament pressure contexts
  • Form recency
Strongest bias: Tactical knowledge
Calibration issues: 8 severe
View full fingerprint →
Google

Gemini 3.1 Pro

The Generalist

6.5/10
Tactical knowledge · #5 of 5
Style: Broad knowledge, skewed by reputation
Good at cross-domain reasoning
Biggest league prestige bias (Premier League × 1.4)
Best for:
  • Squad depth assessment
  • Fatigue factors
  • Cross-competition analysis
Watch for:
  • League prestige matchups
  • Non-Big 5 leagues
  • Tournament narratives
Strongest bias: Tactical knowledge
Calibration issues: 9 severe
View full fingerprint →

Plus two ensemble methods

Beyond the five individual models, we track two ensemble approaches

Simple Average

Average of all 5 model predictions. Equal weights. The baseline to beat.

If The Edge can't outperform this, knowing blind spots doesn't help.

The Edge

Bias-corrected blend. Models weighted by their known blind spots for each match context.

Learn how it works →

When to trust each model

5.4
GPT-5.4
The Market Baseline
Trust when:
  • Standard league matches
  • Fixture difficulty assessment
  • Balanced predictions
  • Demographic evaluation consistency
  • Media narrative anchoring
  • Tactical knowledge index
Be cautious when:
  • Fixture difficulty calibration
  • Upset identification
  • Team narrative override
  • Odds integration
  • Form recency integration
5.5
GPT-5.5
The Evolved Baseline
Trust when:
  • Counter-narrative situations
  • Undervalued teams
  • Non-prestige matchups
  • Club prestige halo
  • Role value encoding
  • Media narrative anchoring
Be cautious when:
  • Upset identification
  • Team narrative override
  • Odds integration
  • Form recency integration
  • Squad depth & fatigue
CL
Claude Sonnet 4.6
The Analyst
Trust when:
  • Tactical analysis
  • Key player absence impact
  • xG-based predictions
  • Club prestige halo
  • Demographic evaluation consistency
  • Media narrative anchoring
Be cautious when:
  • Fixture difficulty calibration
  • Home advantage calibration
  • Upset identification
  • Team narrative override
  • Odds integration
GR
Grok 3
The Contrarian
Trust when:
  • Odds divergence scenarios
  • Market-informed predictions
  • Upset identification
  • Role value encoding
  • Media narrative anchoring
  • Tactical knowledge index
Be cautious when:
  • Home advantage calibration
  • Upset identification
  • Odds integration
  • Form recency integration
  • Squad depth & fatigue
GE
Gemini 3.1 Pro
The Generalist
Trust when:
  • Squad depth assessment
  • Fatigue factors
  • Cross-competition analysis
  • Club prestige halo
  • Media narrative anchoring
  • Fixture difficulty calibration
Be cautious when:
  • Home advantage calibration
  • Upset identification
  • Team narrative override
  • Odds integration
  • Form recency integration

Watch them compete

104 World Cup matches. All predictions public. Real-time leaderboard.