Boneyard Tools

Counting alleles versus predicting them

Why observed allele counting and Hardy-Weinberg prediction are different steps, and how p and q connect the two in population genetics.

Two questions, two methods

Population genetics asks two related but distinct questions about a locus. The first is descriptive: given the genotypes we actually see, what are the allele frequencies right now? The second is predictive: if mating were random and no forces disturbed the pool, what genotype ratios would we expect next generation? This calculator answers only the first by counting, and mixing the two up is a common source of confusion in intro courses.

How direct counting works

Direct counting needs no assumptions at all. You tally every allele copy in the sample, treating each AA person as two A copies and each Aa person as one A and one a. Dividing each allele total by the full 2N pool gives freq A, usually written p, and freq a, written q. For 320 AA, 160 Aa and 20 aa individuals, that yields 800 A copies and 200 a copies out of 1000, so p is 0.8 and q is 0.2.

Where p and q lead next

Once you have p and q, the Hardy-Weinberg principle predicts the genotype frequencies you would expect under equilibrium: p squared for AA, 2pq for Aa and q squared for aa. With p at 0.8 and q at 0.2 that predicts 0.64 AA, 0.32 Aa and 0.04 aa, which for 500 people is 320, 160 and 20. In this particular sample the observed and expected counts match, but real populations often differ.

Why observed and expected can diverge

A gap between the genotypes you counted and the Hardy-Weinberg prediction is informative rather than an error. Selection, non-random mating, small population size, migration and mutation all push genotype ratios away from the idealized model. Comparing your counted frequencies against the predicted ones, often with a chi-squared test, is how researchers detect that one of those forces may be acting on the locus.

Frequently asked questions

What do p and q stand for?

By convention p is the frequency of the dominant allele A and q is the frequency of the recessive allele a. Because the two must sum to one, p plus q equals one, and the Hardy-Weinberg genotype expansion is p squared plus 2pq plus q squared.

If my counts match Hardy-Weinberg, is the population in equilibrium?

A single matching sample is consistent with equilibrium but does not prove it. Equilibrium is a statement about many generations of random mating, so one snapshot only tells you the current genotypes are not obviously disturbed.