Boneyard Tools

Covariance vs correlation

How covariance and correlation relate, why one has units and the other does not, and when to reach for each when comparing two variables.

They share the same core

Covariance and the Pearson correlation coefficient both start from the same quantity: the sum of the products of each variable's deviations from its mean. Covariance stops there, dividing that sum by n or n minus 1 to get an average product of deviations. Correlation goes one step further and divides the covariance by the product of the two standard deviations. So correlation is simply a rescaled covariance, which is why the sign of the two always agrees. If covariance is positive, correlation is positive; if one is zero, so is the other.

Units are the key difference

Covariance carries the units of both variables multiplied together. If x is in dollars and y is in hours, the covariance is in dollar-hours, a quantity that is hard to interpret on its own. That also makes covariance sensitive to scale: measure the same data in cents instead of dollars and the covariance jumps a hundredfold even though the relationship is unchanged. Correlation strips the units away by dividing by the standard deviations, leaving a pure number between -1 and 1 that means the same thing regardless of how you scaled the inputs.

When to use each

Reach for covariance when the scale matters and you plan to feed it into further math, such as a covariance matrix, portfolio variance, or the slope of a linear regression, which is the covariance of x and y divided by the variance of x. Reach for correlation when you want to compare the strength of relationships across different pairs of variables, because its fixed -1 to 1 range makes those comparisons fair. A covariance of 500 sounds large but could reflect a weak link between big numbers, while a correlation of 0.9 always signals a strong one.

What neither one proves

Both measures capture only the linear part of a relationship, so a strong curved pattern can produce a covariance and correlation near zero even though the variables are tightly linked. They are also both sensitive to outliers, since a single extreme pair can dominate the sum of products. And critically, neither implies causation: two variables can move together because one drives the other, because a third factor drives both, or by pure coincidence. Treat a nonzero covariance as a prompt to investigate, not as proof of a cause.

Frequently asked questions

Can I get correlation from covariance?

Yes. Divide the covariance by the product of the two standard deviations, using the sample versions of both, and you get the Pearson correlation coefficient, a unitless value between -1 and 1.

Why does correlation stay between -1 and 1 but covariance does not?

Correlation divides out the standard deviations, which bounds it by the Cauchy-Schwarz inequality. Covariance keeps the raw units and scale, so its magnitude can be any size.