📈 THE INCREMENTALITY MEASUREMENT FRAMEWORK
North Star Setup
🎯 Define One Outcome
Pick one business-linked metric for decisions (e.g., qualified trials, retained revenue), not a vanity proxy.
One metric beats five conflicting ones⏱️ Set Read Windows
Define exactly when outcomes are counted so tests are comparable across channels and cycles.
🧮 Make It Decision-Usable
Your metric must tell you whether to scale, hold, or cut spend this week.
💼 Finance-Ready
If finance wouldn’t care when it moves, it isn’t your North Star metric.
Incrementality Ladder
| Level | Use when | Strength | Weakness |
|---|---|---|---|
| 1. Attribution | Need rapid directional steering | Fast and cheap | Weakest causal truth |
| 2. Pre/Post | Controls limited, medium-stakes decision | Quick calibration | High confounding risk |
| 3. Matched Markets/Cohorts | Need better quasi-experimental confidence | Stronger comparability | More setup and monitoring |
| 4. Randomized Holdout | High-stakes budget or strategy calls | Best causal evidence | Hardest operationally |
How to Run Tests
- Pre-register everything — hypothesis, KPI, duration, stop rules, and decision thresholds.
- Pick highest feasible rigor — don’t pretend attribution is experimentation.
- Protect test integrity — no mid-test KPI changes, no peeking and early stopping.
- Read business + statistical significance — tiny but significant effects usually don’t move the company.
- Reallocate by marginal lift — keep coverage where needed, cut weak marginal spend first.
What Matters vs What Doesn’t
✅ This Matters
- Sustained lift over pre-registered windows
- Result survives replication across cohorts/regions
- Magnitude changes budget decisions
- Findings feed MMM/planning assumptions
🚫 This Doesn’t
- Microscope-level gains with heavy interpretation
- Winner-only reporting after many tests
- Attribution-only proof for budget defense
- One-day significance spikes
Failure Modes to Avoid
🧷 P-Hacking
Stopping tests once significance appears or changing metrics midstream inflates false wins.
🔎 Last-Click Overstatement
High-intent channels (e.g., branded search/app store search) often absorb credit for demand created elsewhere.
🪫 Underpowered Tests
Small samples create noise disguised as confidence.
🧠 No Learning Loop
If insights are not documented and reused, each quarter resets to zero learning.
🧭 Operating Rule
Use attribution for steering, incrementality for truth, and MMM for scale allocation. One system, three roles.
