January 17, 2025 (Updated April 19, 2026)Colin Jaffe/3 min read

Evaluating Model Predictions Against Test Data for Accuracy

Core Evaluation Metrics

Accuracy

Correct / total. Simple but misleading on imbalanced classes.

Precision & Recall

Precision = TP/(TP+FP), Recall = TP/(TP+FN). Trade-off.

F1 Score

Harmonic mean of precision and recall — single number compromise.

Confusion Matrix

Reveals where predictions go wrong, not just how often.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Evaluate the model's predictions against the test data to determine accuracy. Watch this tutorial to learn the key concepts and techniques.

Okay, let's properly test our model now. And again, we withheld some of our data, our test data. We can now see what it thinks.

We're giving it a test—like giving it a quiz and saying, "Okay, you've learned your math; now what is 10 minus six?" All right, trying to teach subtraction. Or you've learned cats and dogs; now what's this—is it a cat? You haven't seen this one before, but based on what you learned, is this a cat or a dog? And we'll see how accurate it was. All right, so our test data is small enough, it's only about 31 rows, so I think we can just take a look at it.

We'll say, "Okay, let's make a variable called model predictions." And assign to it whatever calling model.predict evaluates to. Predict is a method our model now has.

This time we don't pass it X and Y; we don't want it to have the answer. Instead, we just say, "Hey, look at the X-test data and give me your predictions." We'll run that block, and then let's print it out.

And those are certainly some predictions. Are they good? Well, we actually have the answers. We can test it against Y-test.

We can say, okay, print out Y-test. Actually, we want the list version of Y-test because the model predictions are a list, while Y-test is a Pandas series. This'll make this look pretty similar.

Convert it to a list. All right, so some of these are accurate and some of them are going to be a little off. 26.6 compared to 31.39. That's reasonably close.

This one's also reasonably close. It guessed 16.6; it was actually 19. This one's a little more off.

14.69 compared to 22. That's like 50% off. This one, the fourth one, is also super off.

Some of them are going to be correct, and some of them are going to be off. But they're all reasonably close. And some of them are going to be really, really close.

I'm looking for an example here; this prediction of about 39 compared to 46, that's pretty close. And this 19.39 is very close to the 19.58, if I'm counting correctly. I'm not certain that I am.

The great news is that we're just eyeballing it. We're seeing that it's pretty close. We have a way to directly measure how close these answers are.

Let's take a look at that next.