THE LEADERBOARD
Six Methods. Live Results.
Ranked by prediction accuracy. The key question: does knowing each AI's blind spots improve results?
THE KEY QUESTION: Does knowing AI blind spots improve predictions?
Full standings
| Rank | Method | Matches | Correct | Accuracy | Brier | vs Avg |
|---|---|---|---|---|---|---|
| 1 | Grok 3 The Contrarian | 29 | 17 | 59% | 0.571 | −0.008 |
| 2 | The Edge★ Bias-corrected | 29 | 17 | 59% | 0.576 | −0.002 |
| 3 | GPT-5.4 The Market Baseline | 29 | 16 | 55% | 0.578 | −0.000 |
| 4 | Simple Average Equal-weight blend | 29 | 17 | 59% | 0.578 | — |
| 5 | Claude Sonnet 4.6 The Analyst | 29 | 17 | 59% | 0.586 | +0.007 |
| 6 | Gemini 3.1 Pro The Generalist | 29 | 17 | 59% | 0.587 | +0.009 |
Accuracy: Percentage of match outcomes correctly predicted.
Brier: Technical accuracy score (lower = better, 0 = perfect).
vs Avg: Brier score difference from simple average.Negative = beating average,positive = behind.
Understanding the methods
The five models
GPT-5.4, GPT-5.5, Claude, Grok, and Gemini each make predictions independently. Each has documented biases from our fingerprinting research.
View model profiles →Simple average
Equal-weight blend of all five AI predictions. The baseline — if The Edge can't beat this, our corrections don't add value.
The Edge
Bias-corrected blend. We know each AI's blind spots, so we trust them less in those situations.
How it works →Every outcome teaches
If Edge leads: fingerprints help. If Naive leads: we learn and refine. Either outcome advances the research.
The experiment →Which players did AI misprice?
After 104 matches, we'll know exactly which players the models systematically undervalued or overvalued. Get the full Transfer Arbitrage Report in July 2026.
Learn More About the Report