Skip to main content
Colin Jaffe/2 min read

K-Nearest Neighbors with Iris Flower Data Visualization

Machine Learning Essentials

Supervised vs Unsupervised

Labeled data vs unlabeled — different problem classes.

Classification vs Regression

Predict a class label vs a continuous number.

Train/Test Split

Always evaluate on data the model never saw during training.

Hyperparameter Tuning

Grid search and cross-validation to find the best settings.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Visualize how K-Nearest Neighbors classifies iris species using multidimensional data. Watch this tutorial to learn the key concepts and techniques.

Let's take a look at some images to visualize what we're doing with these irises. If you run this code block, you get an image, and it's of a particular species called Versicolor of irises, and we're looking here at the sepal width and length. This is the length; this is the width. These are sepals, and these are petals.

You don't need to have a lot of domain knowledge about flowers to do this one. And we can graph the sepal width against length for everyone and get a graph that may help us understand how K-Nearest Neighbors is going to work with this. So that's the next code block right here.

Run that, and the three species we're going to work with are Setosa, Versicolor, and Virginica. And you can see we have, we look at sepal width and length, we have many setosas over here, many virginicas over here, many versicolors over here. When we have a new item like this one, it's pretty obvious that it's going to be a Virginica.

The nearest neighbors are definitely the Virginica ones. However, the issue that we'll face here is that, yeah, it's pretty easy for us humans to take a look at this data and say for any dot which one it is if we're only looking at two dimensions, sepal width and height. It's much harder to eyeball when we're actually looking at sepal width, sepal height, petal length, and petal width.

Now, that is four dimensions, four variables, and it's hard for us to visualize things in four dimensions. But for the computer, it's actually quite easy. It's very easy for it to calculate the distance between four dimensions and its nearest neighbors and determine the smallest average distance between it and the others along four dimensions, working in four-dimensional space.

This is very hard for us to work with. Even three-dimensional space becomes much more challenging for us, let alone four, five, or six. So, that's where k-nearest neighbors will really help us: working with multi-dimensional, higher-dimensional datasets, as we'll see throughout this lesson.