January 9, 2025 (Updated April 19, 2026)Colin Jaffe/7 min read

Building a Three-Layer Neural Network with Keras and TensorFlow

Three-Layer NN Build

Define Sequential Model

model = Sequential() — stack layers in order.

Add Input + Hidden Layers

Dense(64, activation='relu') for the hidden layer.

Add Output Layer

Dense(num_classes, activation='softmax') for classification.

Compile

model.compile with optimizer, loss, and metrics.

Train

model.fit(x_train, y_train, epochs=10) starts training.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML projects.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Construct a sequential three-layer neural network using Keras. Watch this tutorial to learn the key concepts and techniques.

Let's write a very short but rather intricate code to build our three-layer neural network, and then we'll spend a lot of time above talking about what all these layers are, what they do, and what these arguments we're giving it mean. We're going to create a model, and it's going to be a Keras model that's sequential, meaning it'll go through each of these layers in order, sequentially. And it's actually a list of layers, again in sequential order.

So the first is our input layer, and that's going to be a TensorFlow Keras layer that flattens the input. We'll talk about why in a minute. And we'll say, hey, flatten it, and also its input shape is 28 X 28.

We'll pass it a tuple. Again, we'll talk about each of these lines in a minute. And then our second layer is a dense layer, meaning a hidden layer, and it's going to be 128 neurons.

Sounds like a lot, and as you'll realize, it's actually even more. And we'll use a ReLU activation, and again, I'll explain why in a minute. Then our final layer is our get-it-out-of-there layer—our output layer.

Give us the answer layer. It's still technically a dense layer, but this is usually known as the output layer. And it only needs to be 10, and I can explain this one, actually, without even having to go up above.

It's 10 because it's 10 possible values—digits 0 through 9. And its activation is what's known as softmax, and that's TensorFlow neural networks’ softmax. Okay, so we pass all these layers, and we'll talk about all these things in just a moment. We pass in these layers into the list, and TensorFlow will build us a neural network with these layers and put it in our model.

And it only takes a moment. Let's talk about each of those layers and all of that. So what we're going to do is talk about each of these layers.

You know, the flattened inputs layer. That's this one right here—the flattened one. "Flattened" is a technical term in programming for taking a multidimensional list and making it one-dimensional.

So instead of 28 lists of 28 length, it's one list of 784 items. It helped us visualize this set of pixels in a grid, which is what this 28 X 28—28 columns, 28 rows—thing did for us. But it doesn't help the computer.

In fact, it hurts it. It'll work much better with the exact same data in the same order, but as one long list. We can look at it—it doesn’t need to know that one value is next to another value.

That's not how it actually looks at these images. Instead, it's going to learn how much to weight each individual value of the 784 and what it should weight toward. Okay.

So it doesn't care, again, how we humans read things and how we see patterns. It wants it in a machine-readable way. And just a straight-up list of numbers—machines eat that right up.

And it also wants it normalized. Okay. Now, layer two is where the fun happens.

And by fun, I mean the mystery. The dense or hidden layer—also known as the black box layer. "Black box" is a term for "stuff's happening in there, we can't really see into it."

This is where it takes all of those 784 values and says, okay, this number here seems to indicate a 5 this amount of the time. Let's weight it this way. And what you end up with is this 128-node layer gets fed these 784 inputs, which results in this connection of 100,000 wires between all those 784 possible inputs and 128 nodes.

You multiply those two together. Each of the nodes in the first layer—the inputs—are connected to the 128 in the second. And that's a lot of neurons, a lot of neural wiring.

So it's going to assign a different weight to every single one of those connecting wires. What makes this a black box is we don't really know why wire 85,683 is weighted slightly higher than 23,642. We get all the information, and at the end, we can say, print out your connecting weights in your dense layer, and it'll print it out for us—it'll just be 100,000 numbers that don’t mean anything to us.

But they're very effective for the computer to understand. This is why, for example, Google, which uses neural networks, doesn’t always know exactly how its algorithms work. It has this amazing thing where if you Google something, the top hit is almost always what you want.

And it’s able to do that by using a neural network to optimize its algorithm. So even Google is like, well, it seems to work really well. And they don’t really know exactly how that happens.

In the final layer, we have just 10 neurons—digits 0 to 9. And they're each going to have a value. Each of those is connected to all 128 neurons in the dense layer and is weighting its values and sending signals down to the 0 to 9 output.

And what we end up with is a number between 0 and 1 for each of those ten digits. Each of those 10 nodes gets assigned a number that all add up to 1. It ends up being: what is the percentage chance that it's 0, or 3, or 2? So whichever of those has the highest number—receives the strongest signal—is the winner.

Like, yup, ding ding, looks like it’s 2. 99% sure, or 60% sure, or whatever. And it’s known as the activation. And it’s governed by an activation function.

All right. What is the activation function? I think it's the last thing we need to talk about—that and softmax.

This activation function ReLU is a strange function. Imagine a function called relu that takes in a value.

ReLU takes in a number n. It returns the max—which is bigger—n or 0. And that's all it does. ReLU is an extremely simple function.

If n is negative 5, this will return 0 because 0 is bigger than negative 5. If n is positive 5, this will return 5 because 5 is bigger than 0. It will return whichever is bigger, which means it will be n unless n is negative, in which case it will be 0.

This is how we make sure that it's never subtracting from confidence. That something saying “it doesn’t look like it’s a 5 at all” doesn’t decrease its chance of being a 1—it just decreases the chance of being a 5. We say, hey, at worst, just add zero. If you really don’t think it’s a 5, then be 0% confident that it’s a 5. Great.

This is a function that we need to give to each of the nodes in our middle layer—our hidden dense layer.

There are other ones—most famously the sigmoid function, which used to be much more common. It has this smooth curve. But they discovered that ReLU is faster and better, even though it’s simpler. Just set it to zero if it’s negative; otherwise, that value is good.

The last function to understand is softmax. That is simply the scaling of the 0 to 1 numbers so that they don’t go too low or too high, and so that it all comes out to 100% when you add them all together. That’s all it does.

It manipulates the values to be in the range we want. Just like ReLU manipulates the values in the dense layer, softmax manipulates our output so that it's on the right scale to say, “Oh yep, 53% this, 47% that.” And because we've scaled it right, those are the correct proportions.

It would be like, yep, it's that 53% one. Or more likely, it's just going to be 99% sure that it’s a particular digit. All right.

So that’s all the background you need. We’ve built our network. Next, we’ll compile it and use it.