Boneyard Tools

Estimating carrier frequency with Hardy-Weinberg

How to turn the frequency of a recessive disease into carrier and allele frequencies using the Hardy-Weinberg equation, with a worked example.

Why carriers are hidden in the numbers

Many recessive conditions only appear in people who carry two copies of the recessive allele, the aa genotype. Heterozygous Aa carriers show no symptoms, so they are invisible to a simple count of affected individuals. The Hardy-Weinberg equation lets you work backward from the frequency of affected people to estimate how many silent carriers are in the population, which is exactly the kind of question genetic counselors face.

The one value you need

For a recessive disease the affected frequency equals aa, which is q squared. If a condition affects 1 in 2500 births, then q squared is 0.0004 and q is the square root, 0.02. The dominant allele frequency p is then 1 minus 0.02, or 0.98. Entering aa as 0.0004 in this calculator reproduces those allele frequencies directly and also fills in the AA and Aa genotype frequencies for you.

Reading the carrier frequency

The carrier frequency is the heterozygous term Aa, equal to 2pq. With p of 0.98 and q of 0.02 that is 2 times 0.98 times 0.02, which is 0.0392, or about 1 in 26 people. That single number is often the most useful output: it shows that even a rare disease, affecting only 1 in 2500, can hide carriers in roughly 4 percent of the population, far more than the affected count alone suggests.

Where the estimate breaks down

The carrier estimate is only as good as the equilibrium assumptions behind it. Founder effects, consanguineous marriage, selection against the disease and recent migration can all push real carrier frequencies away from the Hardy-Weinberg value. The method also assumes the disease is purely recessive with full penetrance. Treat the result as a first-order screening estimate and confirm important cases with direct genetic testing.

Frequently asked questions

If a disease affects 1 in 10000, what is the carrier frequency?

Here aa is 0.0001, so q is 0.01 and p is 0.99. The carrier frequency 2pq is 2 times 0.99 times 0.01, which is about 0.0198, or roughly 1 in 50 people.

Why is the carrier frequency so much higher than the disease frequency?

When the recessive allele is rare, most of its copies sit in healthy heterozygotes rather than in affected homozygotes. The 2pq term is much larger than the q squared term, so carriers greatly outnumber affected individuals.