Question 1

What does statistical significance actually tell me?

Accepted Answer

At 95% confidence (p < 0.05), it tells you that if there were genuinely no difference between your variants, you would see a gap this large by random chance only about 5% of the time. That is all. It does not tell you the variant is better, how large the effect is, or whether it matters commercially — a tiny, business-irrelevant difference can be "statistically significant" with enough traffic. Always pair significance with the effect size and a confidence interval.

Question 2

How much traffic do I need for an A/B test?

Accepted Answer

It depends on your baseline conversion rate and the minimum detectable effect (MDE). At 95% confidence and 80% power: a 5% baseline needs roughly 31,000 visitors per variant to detect a +10% relative lift, or about 8,100 for a +20% lift; a 1% baseline needs roughly 163,000 per variant for +10%. Lower baselines and smaller target effects both require far more traffic. If the required sample exceeds what you can gather in 2–4 weeks, test a bigger change or a higher-converting funnel step — don't compensate by stopping early.

Question 3

What is statistical power, and why does 80% matter?

Accepted Answer

Power is the probability your test detects a real effect when one truly exists. The 80% convention means you accept a 20% chance of missing a real winner (a false negative). Underpowered tests — too little traffic for the effect you care about — are the most common and least visible testing mistake: they return "no significant difference" on changes that actually worked, so teams stop iterating on ideas that were quietly winning.

Question 4

Can I stop a test early once it hits significance?

Accepted Answer

No — "peeking" and stopping the moment you see p < 0.05 is the fastest way to ship false positives. Significance fluctuates as data accumulates, so if you check repeatedly and stop at the first significant reading, your true false-positive rate climbs well above 5%. Decide the sample size and run length before launching, and read the result once at the end. If you need to monitor continuously, use a sequential testing method designed for it rather than naive peeking.

Question 5

What is the difference between statistical significance and incrementality?

Accepted Answer

Significance tells you a measured difference is unlikely to be chance. Incrementality tells you the conversions were actually caused by your marketing rather than ones that would have happened anyway. A campaign can show a significant in-platform lift that is largely non-incremental — retargeting users who were already going to buy. The only way to measure true incrementality is a controlled holdout or geo test comparing an exposed group to an unexposed control; a typical "good" incremental lift for performance campaigns is roughly 10–30%.

Question 6

What is a minimum detectable effect (MDE)?

Accepted Answer

The MDE is the smallest improvement you want your test to be able to catch — for example, a +10% relative lift on a 5% conversion rate (i.e., moving it to 5.5%, not 15%). It is an input you choose, and it has an enormous effect on required sample size: halving the MDE roughly quadruples the traffic you need. Set it to the smallest change that would actually be worth shipping, not the smallest change imaginable.

Experimentation & Statistics

Metrics & Definitions

A/B Testing

Statistical Significance

Sample Size

Confidence Interval

Margin of Error

Control Group

False Positive

False Negative

Incrementality

Tools & Calculators

A/B Test Significance Calculator

Creative Testing Budget Calculator

Marketing Incrementality Calculator

Templates

A/B Test Tracker Template

Guides

Creative Testing Framework

Incrementality Testing Guide

Articles & Benchmarks

Statistical Noise in Marketing: Spot Fake Insights Before You Act

How to Run an Incrementality Test That Changes Budget Decisions

Related Topics

Creative Testing

Attribution & Measurement

Data Visualization

Frequently asked questions