Skip to main content
Colin Jaffe/2 min read

Variance: The Core of Statistical Analysis

Why Variance Matters

Measures Spread

Average squared distance from the mean — how scattered the data is.

Foundation for Stats

Standard deviation, Z-scores, and most tests build on variance.

Bias-Variance Trade-off

Core ML concept — too little variance = underfit, too much = overfit.

Feature Selection

Low-variance features carry little information — often safe to drop.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

The next measurement that we might look at is the variance. I alluded to this in a previous video.

The next measurement that we might look at is the variance. I alluded to this in a previous video. Variance is the square of the standard deviation, which is the way a human would think about it, and our common symbol for it is sigma squared, just as sigma represents standard deviation.

But mathematically, when doing statistical work, variance is actually the really important one. And you would think of standard deviation as, in fact, the square root of variance, sigma squared—taking the square root of it to get back to sigma. Variance is a more useful numerical calculation.

It's sort of how standard deviations are derived. Standard deviation is more useful when you're a human, when you're trying to judge a population and examine its distribution. Standard deviation is what you want.

But when you're actually doing calculations, variance is the original. Variance is the one that's more mathematically useful. Let's take a look at how we would calculate variance.

And again, you could just take standard deviation and square it. But there's a direct way to measure that, np.var for variation or variance. Now, we pass in the list that we want.

And again, the degrees of freedom of one, because we're dealing with a population. And if we look at that, it is 208, which is approximately 14.4 squared. And we can even see that if we look at, hey, what's this thing? The power of two.

We see it's the same value. So, variance is useful for when you are trying to look at doing mathematical statistical work based on how much values generally vary from the mean. Variance—from 'vary'—is a better mathematical measurement of this variation, even though we typically as humans look at standard deviation, because instead of being a larger squared value, it's on the same scale as the original values.

If I say that the deviation is roughly 14 degrees from the average temperature, then you know that two thirds roughly of the temperatures will be within 14 degrees of our mean of 79. So, that means that, you know, two thirds of it are in the 65 to 93 range. However, when we're performing mathematical calculations on trying to calculate things based on how varied they are, that's when we would use variance.