Home advantage: Claude thinks it matters more than it does
Home advantage is real. Teams win more often at home, across every league and every era. But how much does it actually matter? Our testing reveals that Claude consistently over-estimates home advantage by 8-12 percentage points — and in a World Cup hosted across three countries, this bias will be triggered constantly.
The calibration test
Our PC02 dimension (Home Advantage Calibration) measures how models adjust predictions when the same matchup is presented in different venues. We give models a fixture — say, Spain vs Germany — and ask for win probabilities when Spain is at home, when Germany is at home, and at a neutral venue.
Historical data tells us that home advantage in international football is worth roughly 5-8 percentage points in win probability. It varies by confederation, by altitude, by travel distance — but as a baseline, that's the range most research supports.
Claude doesn't agree. In our testing, Claude adds 15-20 percentage points for home advantage — roughly double the historical baseline.
Home advantage adjustment by model
pp = percentage points added to home team win probability
Why Claude over-adjusts
We can't know exactly what's in Claude's training data, but we can observe the pattern. Claude's training emphasizes careful reasoning about uncertainty and context. When it considers home advantage, it seems to compound multiple factors: crowd support, familiar conditions, travel fatigue for opponents, referee bias, scheduling benefits.
Each factor is real, but Claude appears to stack them rather than recognizing that they're already captured in the historical baseline. The result is over-adjustment — treating home advantage as more impactful than evidence supports.
Interestingly, Grok shows the opposite tendency. It barely adjusts for home advantage at all, sometimes giving the home team only 2-3 percentage points more than in a neutral venue. This under-adjustment has its own problems, but it means Grok and Claude create natural counterbalances in our ensemble.
The host nation problem
World Cup 2026 is hosted by the United States, Mexico, and Canada. This creates a specific challenge: matches in these countries will trigger Claude's home advantage bias even when the match isn't technically a home game for the host.
Consider: USA plays their group matches in American stadiums. Claude will give USA an extra 15-20 points of home advantage. But so will the market, the other models, and basic logic. The question is whether Claude over-adjusts beyond what's already priced in.
Host nation group stage matches to watch
- USA vs Wales — Claude will heavily favor USA. If Wales keeps it close, Claude was overcorrecting.
- Mexico vs Poland — Azteca atmosphere is legendary. Will Claude's adjustment match reality?
- Canada vs Belgium — Canada's first World Cup home match ever. Emotional boost vs quality gap.
What we saw in calibration
In our 12-match calibration test, Claude's home advantage over-adjustment showed up clearly. The most dramatic case was a friendly where the away team won convincingly despite Claude giving them only 22% odds. The market had them at 31%.
Across all calibration matches, Claude's Brier score (our accuracy metric) was the worst of the five models at 0.561. Much of this came from home matches where Claude's confidence in the home team proved excessive.
| Scenario | Claude prediction | Actual outcome rate | Gap |
|---|---|---|---|
| Home favorites | 68% | 54% | +14pp |
| Away favorites | 41% | 48% | -7pp |
| Neutral venue | 52% | 51% | +1pp |
The pattern is clear: Claude is well-calibrated at neutral venues but systematically over-adjusts when a team plays at home.
How we correct for this
In The Edge ensemble, we tag each match with contextual features including “host_nation_playing” and “strong_home_advantage” flags. When these are present, Claude's weight in the ensemble is reduced from 25% to approximately 18%.
This isn't about distrusting Claude entirely — its caution is valuable in other contexts. But for matches where home advantage is the key variable, we lean more heavily on Grok and Gemini, which show more historically-aligned adjustments.
What to watch for
Throughout the tournament, we'll track how each model performs on host nation matches specifically. If Claude's over-adjustment hurts its accuracy on these fixtures while Grok's under-adjustment helps, that validates our weighting strategy.
But outcomes are noisy. Home teams might genuinely outperform in 2026 — maybe the crowds in American stadiums create more advantage than historical baselines suggest. We're not claiming Claude is wrong; we're claiming it's biased relative to historical evidence, and testing whether that bias hurts predictions.
Tomorrow, we'll zoom out and ask: what happens when models agree versus disagree? Is consensus a sign of accuracy, or does divergence reveal hidden edge?