February 2, 2025 (Updated April 19, 2026)Colin Jaffe/3 min read

Handwritten Digit Recognition with Neural Networks

MNIST Build Checklist

0/6

Dataset loaded (60k train, 10k test images, 28×28 grayscale).

Pixels normalized to 0-1.

Labels one-hot encoded for categorical_crossentropy.

Sequential model with at least one hidden layer.

Output layer with 10 units and softmax activation.

Test accuracy ≥ 97% — anything less means iteration needed.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Explain how handwritten digit images are represented as arrays for training machine learning models. Watch this tutorial to learn the key concepts and techniques.

Okay, let's talk about these training images. They have a shape of (60,000,28,28), meaning there are 60,000 rows of data and each one is a 28 × 28 array. Each of those represents a 28 × 28 pixel image—28 pixels across and 28 pixels down—of a handwritten digit.

Each of those values—784 in total (28 times 28)—is an integer from zero to 255. And that’s the grayscale range: 0 is all the way black, 255 is all the way white, and everything in between is a shade of gray.

We're going to use those 60,000 images to train our neural network machine learning model. Let's take a look at one image. Let's print out the type of `training_images[0]`.

Let's print out the number of dimensions of it. And then let’s take a look at it. Finally, let’s output the whole thing.

Let’s just say `training_images[0]`. All right, let’s run all that. It’s a NumPy array, it’s got two dimensions because it’s 28 × 28, and it printed out like this.

Now, if I print it the regular way—`print(training_images[0])`—I get the full array. There are square brackets at the front that aren't closed until the square bracket at the end here, as you can see. So, all the rest of this represents each row of the image.

The first row is all black, the second row of pixels is all black, the third is all black—basically the first bunch of them are all black. But then we finally start to see some lighter pixels—lighter pixels going back and forth—and it sort of starts to make a little bit of a shape here.

If we just output the image again without using `print`, our Jupyter Notebook will interpret it as an image and display a 28 × 28 pixel visualization. And where on the row—starting around the fifth row—we saw a little white, there’s a little white at the end of the image as well.

We’ll visualize this in different ways as we go, but this is what it is. All of these little parts of the image are 28 × 28 arrays—each little dot is a value from 0 to 255. That’s what makes this image visible.

Let’s take a look—the image is a five. Great. We can see it’s a five. Let’s check the digit by looking at the training label.

So the image has an answer, right? What’s `training_labels[0]`? Oh, not surprisingly, we print it and get 5. And we can, in fact, look at the first 10 values—0 up to but not including 10.

And there it is. Those are our first 10 labels for each of these. And we can see the handwritten version of the next one—which is a zero—by just outputting it as an image here.

And there it is. So our machine learning model is going to have all those lists of lists. What does it look like? And it has to look at all these numbers—which we know represent pixels and digits.

It doesn’t really know that. It has to be able to say, “Okay, that looks like a zero to me.” Let’s explore further how we’re going to do that.