Boneyard Tools

A/B Test Significance Calculator

Find out whether the difference between two variants is real or just noise. Enter the visitors and conversions for your control (A) and variation (B), pick a confidence level, and the calculator runs a two-proportion z-test to return the p-value, z-score and a clear significance verdict.

How to use the A/B test significance calculator

  1. Enter the visitors and conversions for variant A (your control).
  2. Enter the visitors and conversions for variant B (your variation).
  3. Choose a confidence level and read the verdict, p-value and uplift.

Examples

A clear winner

A: 1000 visitors, 100 conversions. B: 1000 visitors, 130 conversions. 95% confidence.
Rate A 10%, rate B 13%, uplift 30%, z-score 2.10, p-value 0.035. Significant.

Too close to call

A: 100 visitors, 10 conversions. B: 100 visitors, 11 conversions. 95% confidence.
Rate A 10%, rate B 11%, p-value about 0.82. Not significant, gather more data.

Frequently asked questions

What does statistical significance mean here?

It means the gap between your two conversion rates is unlikely to be down to random chance. A significant result at 95% confidence implies there is less than a 5 percent probability of seeing a difference this large if the variants truly performed the same.

What is the p-value?

The p-value is the probability of observing a difference at least as extreme as yours if the two variants actually had identical conversion rates. A smaller p-value is stronger evidence of a real effect. The test calls a result significant when the p-value falls below 1 minus your confidence level (0.05 at 95%).

How many visitors do I need?

There is no single number. Smaller true differences and higher conversion rates need larger samples to detect. If your result is not significant, the usual fix is more traffic per variant. Avoid stopping the test the moment it first turns significant, since that inflates false positives.

Is this a one-tailed or two-tailed test?

It is a two-tailed test, which checks whether B differs from A in either direction (better or worse). Two-tailed is the safer default for most experiments. A one-tailed test only asks whether B is better, which roughly halves the p-value but is easier to misuse.

How is the result calculated?

It runs a two-proportion z-test. The conversions are pooled to estimate a shared rate, that rate sets the standard error of the difference, and the z-score divides the observed difference by that error. The z-score is converted to a two-tailed p-value using the normal distribution.

Does my data leave my browser?

No. The calculation runs entirely in your browser. Your visitor and conversion numbers are never uploaded or stored.

Related tools