January 13, 2025 (Updated April 19, 2026)Colin Jaffe/3 min read

Neural Network Training Process for Digit Recognition

NN Training Process

Forward Pass

Input data flows through layers; produces predictions.

Compute Loss

Cross-entropy between predictions and true labels.

Backpropagation

Gradients computed for each parameter via chain rule.

Update Weights

Optimizer (Adam, SGD) applies gradients to reduce loss.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Before we start normalizing data, getting it all set up, and training our model, let's talk about that process a little bit more. We're going to feed the model our 60,000 28 × 28 arrays—our X train, our training data—and the labels—the answers—our Y train.

It's a handwritten eight. I want you to memorize all 784 numbers that make up this eight. Hey, here's another one.

This one's a five. There are 60,000 in total—so good luck. Read them all.

Pay close attention to patterns. Later, we're going to ask you to identify ones you've never seen before, and you should know from looking at these 60,000 and learning these patterns, you’ll be able to look at the next 10,000 and say, “Yep, 98% accurate, ” identifying each one. It’s going to do this through repetition.

We're going to see this in action, which is one of the neatest things about the neural network model system. It’ll detect patterns. It’ll start off doing okay, but it will keep improving. It will train itself.

We’ll watch it train itself. It’ll quiz itself. It’ll say, “Okay, I’ve got this model.”

How accurate am I? And it’ll respond, “Eh, not accurate enough. Let me see if I can improve it.” And it’ll run as many times as we want, trying to improve itself.

It will adjust its knobs and dials. It’ll apply different weights to the hidden layer neurons. It’ll decide, “Okay, maybe this pixel is a little less important.”

“This pixel’s a little more important. This aspect of this pixel, and its relationship to surrounding pixels, might matter more.”

It’ll keep looking at more and more numbers, quizzing itself to make sure it’s still improving. And when it’s done, we’ll be able to test it—show it the 10,000 testing images—and see how it performs. One of the interesting things about neural networks is they hedge their bets.

They assign a confidence score to each of the 10 digits. Most of the time, it’ll say something like, “I’m 99% sure that’s a nine.” And in fact, it’s often rounding to get to 99%—it might be 99.99999999.

For example, when a zero kind of looks like a six, the model might say, “I’m 53% sure it’s a zero and 47% sure it’s a six.” But hey, it leans slightly more toward zero, so it’ll say zero. It will also tell us how confident it was in that prediction.

Now, our testing data is going to be very similar, right? We’re going to give it 28 × 28 arrays—10,000 of them—and say, “Okay, you’ve seen 60,000 of them. You’ve learned how to identify these digits. Hope you got the hang of it—now here’s your 10,000-point quiz.”

Good luck. But it’s going to do great. You’ll probably be surprised at how well it performs.

Okay, next we’ll normalize the data and start training our model.