Metrics & Evaluation Interactive

Brier Score

Measure the accuracy of probabilistic predictions. Lower is better. Critical for evaluating calibration of betting models.

📊 The Brier Score Formula

BS = (1/N) × Σ(p_i - o_i)²

p_i = Predicted probability
o_i = Actual outcome (0 or 1)
N = Number of predictions

Interpretation

• 0.0 = Perfect predictions
• 0.25 = Random guessing (50% each)
• 1.0 = Always 100% wrong

Single Prediction

0 1

Actual Outcome

📊 Result

0.0900

(0.70 - 1)² = 0.0900

Excellent prediction!

Sample Model

Predictions 100

Avg Brier Score 0.1730

Calibration Chart

Well-calibrated model: predicted % ≈ actual %. When you predict 70%, it should happen 70% of the time.

Brier Score Benchmarks

Perfect

0.00

Always predicts exactly right

Excellent

0.10

Tournament winning

Good

0.20

Useful for betting

Average

0.25

Random baseline

Poor

0.35

Worse than guessing

🔬 Brier Score Decomposition

Calibration

How well predicted probabilities match observed frequencies.

70% predictions should win ~70% of the time.

Resolution

How much predictions vary from base rate.

Always predicting 50% = no resolution.

Uncertainty

Inherent unpredictability of outcomes.

Can't be reduced—max at 50% base rate.

🏀 Sports Pricing Applications

Model Evaluation

→ Compare different projection models
→ Track model performance over time
→ Identify miscalibrated probability bins

Pricing Validation

→ Verify implied probabilities are accurate
→ Compare to closing line performance
→ Segment by sport/market for tuning

R Code Equivalent

# Calculate Brier Score
brier_score <- function(predicted, actual) { 
  mean((predicted - actual)^2)
}

# Calibration plot
plot_calibration <- function(predicted, actual, n_bins = 10) { 
  bins <- cut(predicted, breaks = seq(0, 1, length.out = n_bins + 1))
  
  calibration <- data.frame(
    bin = levels(bins),
    predicted = tapply(predicted, bins, mean),
    actual = tapply(actual, bins, mean)
  )
  
  ggplot(calibration, aes(x = predicted)) +
    geom_line(aes(y = predicted), linetype = "dashed") +
    geom_point(aes(y = actual), color = "green") +
    labs(x = "Predicted", y = "Actual") +
    theme_minimal()
}

# Example
predicted <- c(0.7)
actual <- c(1)
bs <- brier_score(predicted, actual)
cat(sprintf("Brier Score: %.4f\n", bs))

✅ Key Takeaways

• Brier Score: lower = better (0 = perfect)
• 0.25 is random guessing baseline
• Penalizes confident wrong predictions heavily

• Use calibration plots to diagnose issues
• Decompose into calibration + resolution
• Track over time to detect model drift