Skip to main content
Colin Jaffe/2 min read

Understanding K-Nearest Neighbors in Supervised Learning

Machine Learning Essentials

Supervised vs Unsupervised

Labeled data vs unlabeled — different problem classes.

Classification vs Regression

Predict a class label vs a continuous number.

Train/Test Split

Always evaluate on data the model never saw during training.

Hyperparameter Tuning

Grid search and cross-validation to find the best settings.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Classify new data points by comparing them to their closest labeled neighbors. Watch this tutorial to learn the key concepts and techniques.

All right, so let's talk about how k-nearest neighbors works. Typically, you set k to three. We're going to look at two of our three nearest neighbors and see which class they belong to.

So in this example, we've got some things classified as green triangles, just to visualize them, and some things classified as yellow squares. Those could be, again, like, you know, dogs and cats by weight and height. I don't know.

But some points on an X and Y according to a couple of variables. And what k-nearest neighbors is going to do is we're going to train them on this data. We're going to say that's a cat or a yellow square or whatever.

That's a dog or a green triangle or whatever. Actually, we're just giving it numbers. And it knows this number; this one over here is this number; that one over there is this number.

So it learns from that where these things are in space, in their X and Y values. Then we can throw in a new instance. Once we've trained the model, saying, "Here's all the data, " we can then say, "Hey, I want to test you."

What do you think this is? And it runs a fairly simple algorithm. Again, it's unlike many other machine learning models. It's not coming up with a new algorithm.

It's always using the same k-nearest neighbors algorithm, which is why this is a slightly different kind of machine learning model. So this supervised learning model, it's going to say, okay, I'm going to look, according to my k-nearest neighbors algorithm, you told me to use a three.

I'm going to look at the three closest neighbors. And it's going to look at this green triangle and this yellow square and say those are the closest ones and more of them are green triangles than yellow squares. I'm going to guess that's a green triangle, whatever class that is.

And that's how k-nearest neighbors works. We'll be exploring that more in a moment.