Skip to main content
Colin Jaffe/3 min read

Building a K-Neighbors Classifier

KNN Build Steps

1

Import & Prep Data

Scale features (KNN is distance-based — scale matters!).

2

Pick K

Start with sqrt(n) or use cross-validation to find optimal.

3

Train & Predict

knn.fit(X_train, y_train); knn.predict(X_test).

4

Tune & Evaluate

GridSearchCV over k and distance metric (Euclidean vs Manhattan).

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Train a k-neighbors classifier on the given dataset and test it on a new data point. Watch this tutorial to learn the key concepts and techniques.

Let's start a k-nearest neighbors classifier, our supervised machine learning model based on the values we've got so far. This time, we'll save the zipped version of X and Y. We'll create a list of data points by zipping X and Y together, and we can now look at the data points. Here they are.

This is similar to before, but now we've saved them because we'll feed this data to the k-nearest neighbors classifier. First, we create our k-nearest neighbors classifier. We’ll define the KNN model as our k-nearest neighbors classifier.

We set the number of neighbors to three. Three is a typical number, and five is also common. This defines how many neighbors to consider when looking for the majority class.

The more neighbors you include, the more accurate your model will be, but also the more time-intensive it will be to train. Three is a good midpoint. These days, people often use five neighbors.

Again, since computers are getting faster, this process is becoming more efficient. It also depends on the complexity of your data. One is generally considered too low now.

It must be an odd number to avoid ties when determining the majority class. If we used an even number of neighbors, like four or two, there could be a tie. We want to get a definitive prediction, not an uncertain one.

That’s why we use an odd number of neighbors. Let’s run that. Now we have a KNN model, but it's not trained yet.

Training or fitting our model involves providing it with X and Y data. In this case, our X represents the data points and their X, Y coordinates. Classes represent the final category each point belongs to, either 0 or 1.

Let's run this, and I'll explain it. KNNModel.fit: We'll pass the data points and their corresponding classes. Now we have a trained model!

It is now trained on the data, in a slightly different way than the previous model. It didn't create a new algorithm. Instead, it now has all the data it needs to apply the k-nearest neighbors algorithm to any new points we test.

Let’s test it, in fact. First, we'll add a new data point. Then we'll test how well it predicts the class of this new point.

We’ll add a new data point, and then we’ll test how well it predicts the class for that point.

We want to obtain a definitive prediction, not an uncertain one.

Let’s test it now. We’ll add a new data point and observe how well the model predicts it.