0/70 completed
Machine Learning Interactive

Anomaly Detection

Find unusual patterns that don't conform to expected behavior. Essential for fraud detection, sharp identification, and data quality.

๐Ÿ“Š Types of Anomalies

๐Ÿ“

Point Anomalies

Single data point is unusual

๐Ÿ“ˆ

Contextual

Unusual for this context/time

๐Ÿ“Š

Collective

Group of points is unusual together

Detection Parameters

Z-Score Threshold 2.5
1 4
True Anomaly Rate (%) 5
1 15
Sample Size 100
50 200

๐Ÿ“Š Detection Performance

Precision 75%
Recall 100%
F1 Score 86%
TP: 3
FP: 1
FN: 0
TN: 96

Data Visualization

Normal points
True anomalies
Detection boundary

Threshold Trade-off

โ†“ Lower Threshold

  • โœ“ Higher recall (catch more)
  • โœ— More false positives

โ†‘ Higher Threshold

  • โœ“ Higher precision (fewer alerts)
  • โœ— Miss more anomalies

๐Ÿ”ง Detection Methods

Z-Score

Distance from mean in std devs

Best for: Normal data, univariate

IQR

Outside 1.5ร—IQR from quartiles

Best for: Robust to skew

Isolation Forest

Random splits to isolate

Best for: High-dimensional

DBSCAN

Density-based clustering

Best for: Clusters of anomalies

๐ŸŽฐ Betting Applications

Fraud Detection

Unusual betting patterns

Sharp Detection

CLV > expected

Line Errors

Odds far from consensus

Data Quality

Stats outside possible range

R Code Equivalent

# Anomaly detection methods
library(isotree)

# Z-score method
zscore_anomaly <- function(x, threshold = 2.5) { 
  z <- scale(x)
  abs(z) > threshold
}

# IQR method
iqr_anomaly <- function(x, k = 1.5) { 
  q1 <- quantile(x, 0.25)
  q3 <- quantile(x, 0.75)
  iqr <- q3 - q1
  x < (q1 - k * iqr) | x > (q3 + k * iqr)
}

# Isolation Forest
iforest_anomaly <- function(df, contamination = 0.05) { 
  model <- isolation.forest(df, ntrees = 100)
  scores <- predict(model, df)
  scores > quantile(scores, 1 - contamination)
}

# Apply
anomalies <- zscore_anomaly(df$feature)
cat(sprintf("Detected %d anomalies (%.1f%%)\n", 
    sum(anomalies), mean(anomalies) * 100))

โœ… Key Takeaways

  • โ€ข Anomaly detection finds unusual patterns
  • โ€ข Trade-off between precision and recall
  • โ€ข Z-score works for normal data
  • โ€ข Isolation Forest for high dimensions
  • โ€ข Define "anomaly" based on business context
  • โ€ข Use for fraud, sharps, data quality

Pricing Models & Frameworks Tutorial

Built for mastery ยท Interactive learning