Machine Learning Interactive
Anomaly Detection
Find unusual patterns that don't conform to expected behavior. Essential for fraud detection, sharp identification, and data quality.
๐ Types of Anomalies
๐
Point Anomalies
Single data point is unusual
๐
Contextual
Unusual for this context/time
๐
Collective
Group of points is unusual together
Detection Parameters
1 4
1 15
50 200
๐ Detection Performance
Precision 75%
Recall 100%
F1 Score 86%
TP: 3
FP: 1
FN: 0
TN: 96
Data Visualization
Normal points True anomalies Detection boundary
Threshold Trade-off
โ Lower Threshold
- โ Higher recall (catch more)
- โ More false positives
โ Higher Threshold
- โ Higher precision (fewer alerts)
- โ Miss more anomalies
๐ง Detection Methods
Z-Score
Distance from mean in std devs
Best for: Normal data, univariate
IQR
Outside 1.5รIQR from quartiles
Best for: Robust to skew
Isolation Forest
Random splits to isolate
Best for: High-dimensional
DBSCAN
Density-based clustering
Best for: Clusters of anomalies
๐ฐ Betting Applications
Fraud Detection
Unusual betting patterns
Sharp Detection
CLV > expected
Line Errors
Odds far from consensus
Data Quality
Stats outside possible range
R Code Equivalent
# Anomaly detection methods
library(isotree)
# Z-score method
zscore_anomaly <- function(x, threshold = 2.5) {
z <- scale(x)
abs(z) > threshold
}
# IQR method
iqr_anomaly <- function(x, k = 1.5) {
q1 <- quantile(x, 0.25)
q3 <- quantile(x, 0.75)
iqr <- q3 - q1
x < (q1 - k * iqr) | x > (q3 + k * iqr)
}
# Isolation Forest
iforest_anomaly <- function(df, contamination = 0.05) {
model <- isolation.forest(df, ntrees = 100)
scores <- predict(model, df)
scores > quantile(scores, 1 - contamination)
}
# Apply
anomalies <- zscore_anomaly(df$feature)
cat(sprintf("Detected %d anomalies (%.1f%%)\n",
sum(anomalies), mean(anomalies) * 100))โ Key Takeaways
- โข Anomaly detection finds unusual patterns
- โข Trade-off between precision and recall
- โข Z-score works for normal data
- โข Isolation Forest for high dimensions
- โข Define "anomaly" based on business context
- โข Use for fraud, sharps, data quality