Metrics & Evaluation Interactive
Brier Score
Measure the accuracy of probabilistic predictions. Lower is better. Critical for evaluating calibration of betting models.
๐ The Brier Score Formula
BS = (1/N) ร ฮฃ(p_i - o_i)ยฒ
- p_i = Predicted probability
- o_i = Actual outcome (0 or 1)
- N = Number of predictions
Interpretation
- โข 0.0 = Perfect predictions
- โข 0.25 = Random guessing (50% each)
- โข 1.0 = Always 100% wrong
Single Prediction
0 1
๐ Result
0.0900
(0.70 - 1)ยฒ = 0.0900
Excellent prediction!
Sample Model
Predictions 100
Avg Brier Score 0.1730
Calibration Chart
Well-calibrated model: predicted % โ actual %. When you predict 70%, it should happen 70% of the time.
Brier Score Benchmarks
Perfect
0.00
Always predicts exactly right
Excellent
0.10
Tournament winning
Good
0.20
Useful for betting
Average
0.25
Random baseline
Poor
0.35
Worse than guessing
๐ฌ Brier Score Decomposition
Calibration
How well predicted probabilities match observed frequencies.
70% predictions should win ~70% of the time.
Resolution
How much predictions vary from base rate.
Always predicting 50% = no resolution.
Uncertainty
Inherent unpredictability of outcomes.
Can't be reducedโmax at 50% base rate.
๐ Sports Pricing Applications
Model Evaluation
- โ Compare different projection models
- โ Track model performance over time
- โ Identify miscalibrated probability bins
Pricing Validation
- โ Verify implied probabilities are accurate
- โ Compare to closing line performance
- โ Segment by sport/market for tuning
R Code Equivalent
# Calculate Brier Score
brier_score <- function(predicted, actual) {
mean((predicted - actual)^2)
}
# Calibration plot
plot_calibration <- function(predicted, actual, n_bins = 10) {
bins <- cut(predicted, breaks = seq(0, 1, length.out = n_bins + 1))
calibration <- data.frame(
bin = levels(bins),
predicted = tapply(predicted, bins, mean),
actual = tapply(actual, bins, mean)
)
ggplot(calibration, aes(x = predicted)) +
geom_line(aes(y = predicted), linetype = "dashed") +
geom_point(aes(y = actual), color = "green") +
labs(x = "Predicted", y = "Actual") +
theme_minimal()
}
# Example
predicted <- c(0.7)
actual <- c(1)
bs <- brier_score(predicted, actual)
cat(sprintf("Brier Score: %.4f\n", bs))โ Key Takeaways
- โข Brier Score: lower = better (0 = perfect)
- โข 0.25 is random guessing baseline
- โข Penalizes confident wrong predictions heavily
- โข Use calibration plots to diagnose issues
- โข Decompose into calibration + resolution
- โข Track over time to detect model drift