Skip to main content
Colin Jaffe/3 min read

Visualizing Normal Distribution with NumPy

Normal Distribution Plot

1

Generate Samples

np.random.normal(mu, sigma, size=10000).

2

Histogram

plt.hist(samples, bins=50, density=True).

3

Overlay PDF

Plot scipy.stats.norm.pdf for theoretical curve.

4

Verify Stats

samples.mean() ≈ mu, samples.std() ≈ sigma.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

If we take a look at a normal distribution now, again, like the bell curve, we can use a NumPy method to get a normal distribution of random numbers. So let's do scores, 100, let's call it that.

If we take a look at a normal distribution now, again, like the bell curve, we can use a NumPy method to get a normal distribution of random numbers. So let's do scores, 100, let's call it that. Actually, let's make it 1,000 scores.

And we'll say np.random.normal this time. This is opposed to the uniform distribution we made before. We specify what the numbers cluster around—what's the mean? And we say, give me a standard deviation of 15, please.

Maybe we're not polite, but I like to be polite. And then we say how many of those we want. So it's not a range like before; instead, it's the standard deviation.

So 68% of them will be within 15 of this. So 68% of them will be between 85 and 150. Okay, now if we just printed out a sample of it, say the first 20, there they are.

You can see they're all clustered around 100, roughly. But there are some outliers, right? Here's an outlier right there. Not too much of an outlier.

Again, it's one standard deviation away. And same if we look at the last 20, right? We have some that are, about two-thirds are within one standard deviation. Okay, now let's go and graph this.

And I think you'll see something interesting starting to emerge. Our X is our 1,000 scores. Let's keep the number of bins at 20.

Oh, and PyPlot, show the graph. And here they are. And you can see it's a bell curve.

It's a little off because our sample size is so small that it's not quite matching. Like it seems like we got nothing under 60, but more outside the distribution over on this side. However, the more we do, the more we're going to, and the more granular we're being with our bins, the more we're going to see the curve smooth out.

So let's make the same thing, but with 250,000 of them. We'll say we still want the mean to be 100 and the standard deviation to be 15, but give me 250,000 of them. And then let's make a histogram where X is those scores.

Actually, let's increase the number of bins to get a bit more granular. We'll do bins of 100.

Yeah, I think that'll look much better. And what we're getting is a much, much, much more standard bell curve, but maybe a little weirdness still up there. It's a little jagged still, it's not fully smooth, but it's much smoother.

And you can see that the greater the sample, the more things even out over time, even with randomness.