March 20, 2025 (Updated April 19, 2026)Colin Jaffe/2 min read

Model Accuracy: Understanding Precision and Recall Metrics

Build a Classification Model

Load and Inspect Data

pd.read_csv, check shape, dtypes, missing values.

Split and Scale

train_test_split, StandardScaler fit on train only.

Fit and Predict

model.fit(X_train, y_train); model.predict(X_test).

Evaluate

classification_report, confusion_matrix — beyond just accuracy.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Evaluate classification results using accuracy, precision, recall, and the F1 score to identify prediction errors. Watch this tutorial to learn the key concepts and techniques.

Let's check our score a couple of different ways. First, accuracy. What is our accuracy—out of all the predictions we made, how many were correct? We can get that by doing knn_model.score. And it looks like, ah, we need to score some data.

We're missing two required positional arguments: X and y, indeed. In order to score it, we need to give it the testing data. Here's the X_test data.

Make your predictions based on that, and then here's the answers. Tell me how many we got right. And that's pretty good, 97%.

So that means we only missed 3% of it, which probably means only one wrong out of 30. Getting one wrong would result in 97%. We got one wrong out of 30.

We could sit here and eyeball it to try and figure out which one it is. Tempted to do that, but we definitely got one of them wrong, and we can see, however, better if we get a classification report. It will tell us what we missed.

If you remember, we talked about precision and recall. Precision is out of that category. When we guessed that category, how often were we right? And recall is how often our guesses for that category were correct, out of the total number of times it actually was that category. How often did we identify that category correctly? We can get all of that, and the F1 score, which is the harmonic mean of precision and recall, we can get that using the classification report.

That's a function given to us by sklearn.metrics. Let's make a report. It's the classification report, and we'll pass it.

Here's the actual answers. Here's our model's predictions. And also, just to make this easier for us to read, we'll give it the iris data's target names, and then we'll print that report.

And here it is. We can see that we had perfect precision and recall on setosas, but we got a little bit wrong in the versicolor and virginica. We'll dive into that more in the next video.