February 3, 2025 (Updated April 19, 2026)Colin Jaffe/4 min read

Precision and Recall: Improving Predictive Model Accuracy

Precision vs Recall

Feature	What It Measures
Precision	Of predicted positives, how many were correct? TP / (TP + FP).
Recall	Of actual positives, how many did we catch? TP / (TP + FN).
F1 Score	Harmonic mean of precision and recall — balances both.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Evaluate model performance highlighting precision, recall, and the impact of prediction errors. Watch this tutorial to learn the key concepts and techniques.

We're going to look at two measures from sklearn of what we got right, what we got wrong. As I said, we did okay; we did quite well when we predicted stayed, and of the people who stayed, we got really good. But missing positives in one way or the other is what we're going to measure here, because those are the ones we did relatively poorly in.

Precision. Precision is, we pass it y_test, and our predictions. Precision is, hey, we made many positive predictions.

How many were right? It's this right-hand column here, right? Of the total that we predicted left. Total times we were like, yep, that person left. How often were we right? A little more than 50%, about 55–60%, right? We predicted left, and out of the 1,350, we got 730 right.

So that's good, but not great. How about recall? We have this, another sklearn function, recall score. We pass it our, the actual answers, and our model's predictions.

And this is, of the total people who left, how many did we predict correctly? So looking at the actual left number, these 2,800 or so here, how many did we get right? And the answer is not very many. We got about 20%, or sorry, 25% or so. Correct.

When they did leave, we usually did not identify that. Let's take a look at these two scores. I'm going to print precision, but I'm going to misspell it, and let's print recall, precision.

All right, our precision was, again, 54%, and our recall was 26%. So not great. This could be okay.

Again, our overall accuracy was quite high, 77%. But the way we did our model, we got very good at guessing that they stayed, and not so great at guessing that they left. And there are some times when that's good, but there are some times when a low recall is very bad.

There are times when we wanted to get it wrong in the other direction. We wanted to predict that they left, but they didn't. And a good example of this, of not wanting false positives, a use case for which this would be very bad, or not wanting false negatives.

No, that was not the thing I'm trying to identify, but it actually was. A good example of that when that's really bad is a medical diagnosis. Is somebody sick or not? If you tell them they're healthy, but they're actually sick, if you have a test with a false negative, that's much more costly than the other way around.

People would much rather, and understandably, correctly, people would much rather think that they were sick, but it turns out they weren't, the test was just a false positive, than the other way around. That, oh, hey, the test said I was healthy, but it turns out when I checked it again, that I'm sick. And I've been going through all this time thinking I'm healthy, and I'm not, right? Especially if you're talking about diseases, spreading diseases, or having something left untreated.

So there are certain cases where it matters which side you err on. In this case, I don't know that it matters. Thinking that somebody is going to stay in the company, but they didn't, you can decide for what use case you want.

Is that better or worse than thinking that somebody is going to leave, but they're actually going to stay? So that's something that we actually want to model. We want to adjust this model based on, hey, when it does get it wrong, what direction do we want it to get wrong? Again, its overall accuracy was quite good. 77% of the time, our model knew whether someone would stay or leave.

But it was skewed to one side, and that's another consideration that you make as you're trying to fine-tune your model for the use case you have.