ModelBall

THE COMPETITORS

Five AIs. Different Biases.
Same Matches.

Each model has been fingerprinted across 45,300 queries.
We know their blind spots. Now we test if that knowledge helps.

Fingerprint comparison

Distance from center = bias strength. Each spike shows where that AI over or under-values something.

Every model is given the same evidence — same rankings, same form, same news, same prompt. The differences plotted here are purely artefacts of how each model reasons.

Full blind spot comparison

Bias strength across all dimensions. Red = bigger bias. Hover for details.

Dimension	GPT-5.4	GPT-5.5	Claude	Grok	Gemini
▸D01League prestige discount	+1.37	+1.38	+1.35	+1.18	+1.41
▸D02Club prestige halo	+0.65	+0.20	+0.19	+0.62	+0.04
▸D03Demographic evaluation consistency	-0.11	-0.55	+0.09	+1.17	+1.14
▸D04Age curve encoding	+0.75	+0.93	+1.06	+0.87	+0.75
▸D05Temporal weighting	-0.61	-0.64	-0.69	-0.61	-0.47
▸D06Tournament pedigree encoding	-0.21	-0.64	-0.47	-0.33	-0.57
▸D07Attribute type preference	-0.75	-0.94	-1.14	-0.47	-0.31
▸D08Role value encoding	-0.58	+0.14	-0.24	-0.00	+1.27
▸D09Risk tolerance in selection	-0.70	-1.13	-0.37	-0.88	-0.96
▸D10Media narrative anchoring	+0.03	-0.05	-0.16	+0.12	-0.00
▸D11Tactical knowledge index	+1.60	+1.33	+1.73	+1.87	+3.47
▸D12Tactical context adjustment	-1.30	-1.30	-1.11	-1.44	-0.70
▸PC01Fixture difficulty calibration	-0.75	-0.54	-1.08	0.00	-0.04
▸PC02Home advantage calibration	+0.12	0.00	+0.86	-0.99	+0.81
▸PC03Upset identification	-1.57	-1.57	-1.57	-1.57	-1.57
▸PC04Team narrative override	-1.57	-1.57	-1.57	-0.03	-1.57
▸PC05Odds integration	-1.05	-1.57	-1.57	+0.64	-1.57
▸PC06Form recency integration	-1.57	-1.28	-1.37	-0.79	-1.22
▸PC09Squad depth & fatigue	-1.57	-1.33	-1.57	-1.25	-1.07
▸PC10Key player absence	-1.30	-1.02	-1.09	-1.18	-1.30
▸PC11Stakes & pressure calibration	-1.57	-1.57	-1.57	-1.57	-1.57
▸PC14Expected goals integration	-1.57	-1.57	-1.57	-1.57	-1.57

Under-correction

Over-correction

−1.570+1.57

Meet the models

GPT-5.4

The Market Baseline

Tactical knowledge · #2 of 5

Style: Conservative and balanced predictions

✓Widely used, moderate across dimensions

✗Follows market consensus too closely

•Standard league matches
•Fixture difficulty assessment
•Balanced predictions

•Host nation matches
•High-stakes scenarios
•Form recency weighting

Strongest bias: Tactical knowledge

Calibration issues: 9 severe

View full fingerprint →

GPT-5.5

The Evolved Baseline

Tactical knowledge · #1 of 5

Style: More skeptical of popular narratives

✓Reduced narrative bias vs GPT-5.4

✗Still anchors on media sentiment

•Counter-narrative situations
•Undervalued teams
•Non-prestige matchups

•High-profile fixtures
•Tournament drama
•Media-hyped scenarios

Strongest bias: Upset ID

Calibration issues: 8 severe

View full fingerprint →

Claude Sonnet 4.6

The Analyst

Tactical knowledge · #3 of 5

Style: Thoughtful but over-corrects for context

✓Best tactical reasoning, detailed analysis

✗Over-adjusts home advantage by 15-20%

•Tactical analysis
•Key player absence impact
•xG-based predictions

•Home advantage calibration
•Host nation matches
•Prestige matchups

Strongest bias: Tactical knowledge

Calibration issues: 10 severe

View full fingerprint →

Grok 3

The Contrarian

Tactical knowledge · #4 of 5

Style: Data-driven, sometimes too mechanical

✓Strong odds integration, market-aware

✗Ignores home crowd effect entirely

•Odds divergence scenarios
•Market-informed predictions
•Upset identification

•Home advantage scenarios
•Tournament pressure contexts
•Form recency

Strongest bias: Tactical knowledge

Calibration issues: 8 severe

View full fingerprint →

Gemini 3.1 Pro

The Generalist

Tactical knowledge · #5 of 5

Style: Broad knowledge, skewed by reputation

✓Good at cross-domain reasoning

✗Biggest league prestige bias (Premier League × 1.4)

•Squad depth assessment
•Fatigue factors
•Cross-competition analysis

•League prestige matchups
•Non-Big 5 leagues
•Tournament narratives

Strongest bias: Tactical knowledge

Calibration issues: 9 severe

View full fingerprint →

Plus two ensemble methods

Beyond the five individual models, we track two ensemble approaches

Simple Average

Average of all 5 model predictions. Equal weights. The baseline to beat.

If The Edge can't outperform this, knowing blind spots doesn't help.

The Edge

Bias-corrected blend. Models weighted by their known blind spots for each match context.

Learn how it works →

When to trust each model

5.4

GPT-5.4

The Market Baseline

Trust when:

✓Standard league matches
✓Fixture difficulty assessment
✓Balanced predictions
✓Demographic evaluation consistency
✓Media narrative anchoring
✓Tactical knowledge index

Be cautious when:

✗Fixture difficulty calibration
✗Upset identification
✗Team narrative override
✗Odds integration
✗Form recency integration

5.5

GPT-5.5

The Evolved Baseline

Trust when:

✓Counter-narrative situations
✓Undervalued teams
✓Non-prestige matchups
✓Club prestige halo
✓Role value encoding
✓Media narrative anchoring

Be cautious when:

✗Upset identification
✗Team narrative override
✗Odds integration
✗Form recency integration
✗Squad depth & fatigue

CL

Claude Sonnet 4.6

The Analyst

Trust when:

✓Tactical analysis
✓Key player absence impact
✓xG-based predictions
✓Club prestige halo
✓Demographic evaluation consistency
✓Media narrative anchoring

Be cautious when:

✗Fixture difficulty calibration
✗Home advantage calibration
✗Upset identification
✗Team narrative override
✗Odds integration

GR

Grok 3

The Contrarian

Trust when:

✓Odds divergence scenarios
✓Market-informed predictions
✓Upset identification
✓Role value encoding
✓Media narrative anchoring
✓Tactical knowledge index

Be cautious when:

✗Home advantage calibration
✗Upset identification
✗Odds integration
✗Form recency integration
✗Squad depth & fatigue

GE

Gemini 3.1 Pro

The Generalist

Trust when:

✓Squad depth assessment
✓Fatigue factors
✓Cross-competition analysis
✓Club prestige halo
✓Media narrative anchoring
✓Fixture difficulty calibration

Be cautious when:

✗Home advantage calibration
✗Upset identification
✗Team narrative override
✗Odds integration
✗Form recency integration

Watch them compete

104 World Cup matches. All predictions public. Real-time leaderboard.

View Leaderboard Browse Matches