Econometrics Interactive

Propensity Score Matching

Estimate treatment effects when randomization isn't possible. Match treated and control subjects with similar propensity scores to reduce selection bias.

📊 The Selection Bias Problem

When treatment isn't random, treated and control groups differ systematically. Naive comparison conflates treatment effect with selection effect.

Example

VIP program members have higher LTV. But is it the VIP perks, or were they already high-value customers before joining?

The PSM Solution

1. Estimate P(treatment | covariates)

2. Match treated to controls with similar P

3. Compare matched pairs only

"Compare apples to apples" by matching on treatment propensity.

Parameters

0 10

0 20

100 400

📊 Effect Estimates

True Effect 10.0

Naive (biased) 15.1

Bias: 5.1

PSM Matched 12.6

Error: 2.6

Propensity Score Distribution

Treated

Control

✓ Good overlap in propensity scores - matching feasible

Matching Process

1️⃣

Estimate Propensity

Logistic regression: P(T|X)

2️⃣

Match

Pair treated with similar controls

3️⃣

Compare

ATT = mean(Y_T - Y_C) for pairs

🎰 Betting Applications

VIP Effect

Match VIPs to non-VIPs with similar prior activity.

Compare LTV post-enrollment.

Promo Impact

Match promo recipients to non-recipients by propensity.

Estimate true promo ROI.

Feature Adoption

Match users who adopted new feature vs those who didn't.

Measure engagement lift.

R Code Equivalent

# Propensity score matching
library(MatchIt)

# Simulate data
set.seed(42)
n <- 200
x <- rnorm(n, 50, 15)
propensity <- plogis(-2 + 5 * 0.01 * x)
treated <- rbinom(n, 1, propensity)
y <- 100 + 0.5 * x + 10 * treated + rnorm(n, 0, 10)
df <- data.frame(x, treated, y)

# Naive comparison
naive <- mean(df$y[df$treated == 1]) - mean(df$y[df$treated == 0])
cat("Naive effect:", naive, "\n")

# Propensity score matching
m.out <- matchit(treated ~ x, data = df, method = "nearest")
m.data <- match.data(m.out)

# Matched comparison
matched_effect <- mean(m.data$y[m.data$treated == 1]) - 
                  mean(m.data$y[m.data$treated == 0])
cat("Matched effect:", matched_effect, "\n")

# Check balance
summary(m.out)

✅ Key Takeaways

• PSM reduces selection bias in observational data
• Match on P(treatment | covariates)
• Check for common support (overlap)

• ATT: Average Treatment effect on Treated
• Requires all confounders measured
• Balance checks essential post-matching