Skip to main content
Colin Jaffe/2 min read

Exploring KNN with the Iris Dataset in Python

ML Project Workflow

1

Define the Problem

What outcome are you predicting and why?

2

Prepare the Data

Clean, normalize, encode categoricals, split into train/test.

3

Train Models

Start simple — logistic regression baselines often surprise.

4

Evaluate & Iterate

Confusion matrix, ROC, F1 — pick metrics that match the problem.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Apply the K-Nearest Neighbors algorithm to classify iris flowers using the sklearn iris dataset. Watch this tutorial to learn the key concepts and techniques.

We're going to now look at applying K&N, the K-Nearest Neighbors algorithm, to a more realistic dataset. We're going to use the famous iris dataset from sklearn. The iris dataset is a collection of iris flowers with their sepal length and width, and petal length and width.

And, you know, you don't need to know a lot about flowers to do this, fortunately. But we can plot these; we can feed the sepal length, sepal width, petal length, and petal width data to a K-Nearest Neighbors algorithm. And it will look at, hey, which one was closest among all four features to, you know, what are the nearest neighbors to that particular new flower.

And we'll find that this has surprisingly good accuracy. All right. So here are our imports.

Here are the things we'll need: NumPy and Pandas. We'll be showing you some images to help visualize this.

And we do need to load the iris data in. They give us a function called `load_iris` that we can use for that. We'll also have, you know, our more typical train test split and the K-Nearest Neighbors classifier model initialization.

And we'll also be using a classification report, which will show us precision, recall, and other useful evaluation metrics to see how we did. The other code we are giving you includes our Google Drive loading block. Let's run both of those.

This may take a minute if it's the first time running it, as it is for me. And we'll also want to grab Google Drive. So you'll run this block as well.

And once you've imported everything and loaded Google Drive, we'll dive into what these flowers are and what data we have to work with.