January 27, 2025 (Updated April 19, 2026)Colin Jaffe/6 min read

Neural Network Predictions: Accuracy and Fine-Tuning

Fine-Tuning Levers

Learning Rate

Often the single biggest knob — try 1e-3, 1e-4, 1e-5.

Batch Size

Smaller batches generalize better; larger train faster.

Regularization

Dropout, L2 weight decay, and data augmentation reduce overfitting.

Architecture

Add layers, increase width, or try residual connections.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML projects.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Evaluate model predictions on test images and compare them to actual labels to assess accuracy. Watch this tutorial to learn the key concepts and techniques.

Let's take a look at predictions and their shape. And in the next section, when we do neural networks one more time, we'll take a look at really analyzing whether these predictions are right on a bigger scale. But for now, let's take a look at the predictions themselves.

I'm going to make a predictions variable and set it to what our model.predict method returns, running it on the normalized testing images. That'll take just a second. It's got to actually run it.

There we go. Literally one second. But you know, in Python terms, that's forever.

And now let's analyze that. What is our value? Let's print the first testing value and do predictions at index zero. Whoa, that's kind of hard to read, right? These are some very math-heavy floats.

This is 1.13 times 10 to the negative seventh. This one is actually 99%, 0.99. This is 9.99 times 10 to the negative first. We're going to do a little bit of work to make this a better shape in just a moment.

But yeah, this is a list of all its probabilities—its chances, its confidence—that it is each of the 10 digits. And you can see that some of these are incredibly small, like 0.000001%. Maybe five zeros percent.

And what that means is that it usually has almost no belief that it's that number, but this is its confidence or lack thereof that it's a zero. This is one, this is two. What we're looking at here is the index.

In our case, just because we're looking at digits zero through nine, the indices 0,1, 2, etc., also represent the digit it's predicting. And so for this one, if we counted this up—zero, one, two, three, four, five. Did I just lose count? I think so.

Zero, one, two, three, four, five, six, seven. It thinks that first one was a seven. It's 99.96% confident it's a seven.

Let's take a look at these predictions in a slightly more readable format. I'm going to do a little bit of fancy formatting with a bit of fancy Python—not that fancy, but a fairly standard list comprehension. If your Python chops are up, then this kind of thing isn't so hard.

But if not, that's totally fine. Other people focus on different things. But let's take a look—we're going to make a new list where we convert each prediction to a float for every prediction in the predictions list.

But we're not done. But wait, there's more. We're also going to multiply each one by 100 to put it on a percent scale.

And we're also going to round it to two decimal places, so it's 99.99% maybe. So I'm going to say round the float of prediction times 100 to two decimal places. All right.

And it looks like I did my fanciness slightly wrong. Probably had to do with the parentheses. Oh, let me take a look.

Only 1-length arrays can be converted to Python scalars. Okay, what did I do? Float prediction times 100.

Oh, we actually want to make a float out of prediction times 100, not just multiply prediction by 100. Yeah. Okay, so that's one mistake, but don't worry.

There are more mistakes. Ah, predictions is all 10,000 answers. We want predictions[0].

There we go. All right, so 0% is what it narrows down to: 0%, 0%, 0.04%, 0%, 0%, 0%, 99.96%. So what this means is—zero, one, two, three, four, five, six, seven—it was 99.96% sure it was a seven, but with a 0.04% chance that it was actually a three.

But that's really pretty sure that it was a seven. Now we can check—hey, was this a seven? We can say, actually first, hey, my counting could be off. Let's see what the biggest number here is.

np.argmax will give us the index of the highest value in an array. We could say print out np.argmax, the prediction was—can never spell the word "prediction" right. My prediction is I will always misspell "prediction."

np.argmax of predictions at index zero. Yep, seven. Just because counting from zero is hard, and we all make mistakes.

All right, so what was our actual digit? Let's see. It would be—the correct answer was—and it would be in our labels, our testing labels at index zero. The correct answer was seven.

Great. Now, what if we want to look at more digits? We want to have a better eyeball of how accurate this thing was. All right, so I'm going to print—I'm going to make some predicted digits, and it's going to be another one of these list comprehensions which hopefully I won't make a mistake in this time.

We'll convert to an integer the index of the maximum value of a prediction for every prediction in predictions. Yep, so for every prediction—every array of 10 items—give me the integer version of the index of the highest number.

Nailed it. And that'll be our new list of predicted digits. Let's take a look at how we could look at the correct answers now.

Not much need to turn it into a list comprehension—just a little simpler—but to put it into a format that'll match up, we also want it to be an integer. So int of label, maybe.

We should probably do it more technically correct and say label—for label in testing labels. Let's check predicted digits—maybe the first 30.

Print those out. Also print out the correct answers—the first 30. Okay, I did all that manipulation to make these line up really well.

And you can see for the first 30—let's check, let's check. Yep, it got every single one right. Let's check from index 30 to 60.

Scanning across—looks like we also got all those right. But how many do you think we'll have to check before we find a wrong answer? Could be quite a lot.

I'll stop at a certain point, but let's check number 60 to 90. All correct. And let's do one more check.

Let's check 90 to 120. Correct, correct, correct, correct, correct, correct, correct, correct, correct. So it got the first 120 absolutely right.

This is a really, really good system. So in the next lesson, we'll take a look at how accurate a model like this actually is—looking at harder metrics than eyeballing.

And we'll also solve a new problem with it. And we will also finally set things up so that we have a proper way to measure it and talk about how to fine-tune a system like this—and how to overtune—how to apply too much tuning of a system like this. All this coming up.