Skip to main content
Colin Jaffe/4 min read

Standard Deviation and the Bell Curve

Bell Curve Properties

68-95-99.7 Rule

68% of values within 1σ; 95% within 2σ; 99.7% within 3σ.

Symmetry

Normal distribution is symmetric around the mean.

Z-Score

(x - μ) / σ standardizes any value to standard normal.

Use in ML

Foundation for many algorithms — Naive Bayes, anomaly detection, etc.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Explain standard deviation as a measure of how values spread out from the mean in a normal distribution. Watch this tutorial to learn the key concepts and techniques.

Let's talk about standard deviation. It's a measure of how far a value is from the mean. And again, we usually measure this from the mean because the important question is, "Hey, what are our values mathematically? What's the middle of our values?"

And how much do things deviate from that? So, there's a distribution called the normal distribution because it's very common. And in a normal distribution,

Most values cluster around the mean. And it usually is symmetrical. And it's called a bell curve.

What does that mean? It means that on a bell curve, we have a few outliers, and most things are right in the middle so we'll see a big rise in the curve towards that. Now, the standard deviation is a measurement.

Looking at the values and how they vary from the mean. If the standard deviation is set so that one standard deviation encompasses 68.2% of the overall values. So about two thirds of all values will be within one standard deviation.

And that's how we measure standard deviation: what difference, plus or minus from the mean, would encompass this percentage? Then we have two standard deviations, whatever this difference is—this deviation from the middle, this difference between the middle and our value—again, plus or minus. Double that.


With two standard deviations 95.4% of our values will be within those deviations. Now for three standard deviations, again, we're getting almost all of our values. At that point we're looking at, you know, two out of,000 values will be more than three standard deviations.

So let's take a look at that and see what this would look like. Here, if we execute this cell it'll load an image for us from our Google Drive. And this is a visualization of that. This is the bell curve that we're talking about because it's sort of vaguely bell-shaped.

Now, the 68.2% here is these two middle ones again, half of it to the left half of it to the right. These lines indicate how many standard deviations—the symbol for that is sigma, the Greek letter sigma. Then, a small number of other ones a smaller number are between one and two standard deviations away.

And a very small amount is more than three or more than two, rather. The last, you know, another four, four and a half percent or so are more than two standard deviations, and then more than three is the last point 1% on each side, the extreme outliers. And you see this kind of distribution all the time.

If you think about height for humans is a good example. You'll get height for humans. Well, you know, for, let's say a White male in America, the height is average height is five-foot-nine.


And you see people, White men in America tightly clustered around that amount right within, you know, two or three inches. And it's a few people who are, you know, shorter than five, six, again, for this subset of the population, White men in America. So that's, you know, lower than five, six might be one deviation, one standard deviation, three inches, you know—estimating here.

And then, you know, going the other direction, you know, six feet over six feet tall would be the same amount, same deviation from five, nine in the other direction. So most people two thirds, if three inches is a standard deviation would be between five, six and six feet. And then you have your more extreme outliers, your very short people over on this side, and your basketball players over on this side.

So that standard deviation, we're going to take a look at how to calculate it next.