Skip to main content
Colin Jaffe/2 min read

Versicolor and Virginica Misclassification in KNN Models

Why KNN Confuses Versicolor and Virginica

Overlapping Feature Space

These two species share similar petal and sepal measurements.

Setosa is Linearly Separable

By contrast, Setosa sits clearly apart — easy to classify.

Choice of k Matters

Small k overfits noise; large k blurs the decision boundary.

Try Different Distance Metrics

Manhattan vs Euclidean can shift which neighbors win.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Analyze how the KNN algorithm misclassified one virginica sample as versicolor due to its proximity to versicolor data points. Watch this tutorial to learn the key concepts and techniques.

Let's analyze this classification report to see what we missed and how we missed it. Looking at this, we could say the precision for Versicolor was imperfect. What does this mean? Remember, precision is how often it was the correct category out of our guesses for that category. We guessed Versicolor many times, and 90% of the time we were right, but there was one instance we missed.

We missed one prediction. We said it was Versicolor, but it wasn't. We can see what it actually was because this one has imperfect recall: Virginica.

Recall, remember, is how often we guessed that category correctly out of how many times it actually was that category. We guessed there was a Virginica that we missed. How often we guessed it correctly out of how many times it actually was Virginica.

90% of the time, it was Virginica. We were like, "Yeah, that's Virginica." But there was one that we missed.

So, there was a Virginica that we miscategorized as a Versicolor. Here, our model predicted this as a 1, but it was actually a 2. It thought this one was a Versicolor, but it was actually a Virginica, a 2. We incorrectly guessed it was a Versicolor when in fact it was a Virginica.

We could take a dive into exactly why that happened. The reason it happened is that this particular Virginica was closer to some Versicolors than to other Virginicas. It was a bit of an outlier toward the Versicolor side.

Although, you know, again, 'sides' implies it's one-dimensional, but in fact, it's four-dimensional. Its petal length, width, sepal length, and width were just slightly closer to the Versicolors than to the Virginicas. Or closer to more of them because we have neighbors checking; we're checking the K nearest neighbors, and K is 3. So, looking at the three nearest neighbors, more of them were Versicolor than Virginica, but this one actually was a Virginica.

Still, Versicolor and Virginica are very close to each other in the data.

Overall, we got a 97% score, 96.6 repeating. That's very good, and it's a testament to how effective K-nearest neighbors is as an algorithm, as we can identify, even across multiple dimensions, what something is based on the data we've seen before.

And that's K-Nearest Neighbors.