Skip to main content
Colin Jaffe/3 min read

Mode: Finding the Most Common Value in Data Analysis

Mode Use Cases

Categorical Data

Most frequent category — what mean and median can't tell you.

Filling Missing Values

Common imputation strategy for categorical features.

Pandas .mode()

df['col'].mode() returns most common value(s).

Multiple Modes

Bimodal data has two modes — both returned in pandas.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Calculate the mode to identify the most frequently occurring value in a dataset. Watch this tutorial to learn the key concepts and techniques.

Let's take a look at another statistical measurement we could take, and that is the mode. Mode is another kind of average, where mean gives you the mathematical middle number, and median gives you the actual middle value. Mode gives you the most typical value, the number that shows up the most.

And it might not be very predictive, but it is sort of a good way to look back at data and say, hey, which one of these numbers is your typical number? Which one of these numbers is the one, if I had to pick a number for it to be, not for it to be near, but an actual value from the list that is most likely to be, it's the mode. It's the one that shows up the most. Now, mode is an interesting one as well, because it doesn't have to be numerical, right? You can look at the most, the mode letter, what is the letter that shows up the most? Or the mode string, or the mode name.

It could be any kind of value, not necessarily number, because there's no math here. It's simply looking at what are the unique values and which one shows up the most. Let's take a look at how we could calculate it.

For grades, we could eyeball it, for sure. But as we look, when we turn this to year, to the year resale value, it's not as easily eyeballed when you have a real dataset. But if we're looking at this, we could say, I want to calculate the mode.

And the way we would do that is we, let's actually take a, let's actually make a mode grade value. Let's make a variable called that. And let's set it to be our calculation of mode.

Now, mode is not built into Python, nor is it built into NumPy. But it is built into our library that we imported above, stats. From SciPy.

Stats.mode. And then we could pass in not degrees, grades. And let's look at mode grade. Now, what we got back was not a number.

And it's kind of an interesting one. It actually gave us back two values. We're going to take a what that is in a second, brush up on our Python on it.

It gave us back 85. That is the number that shows up the most. And the count of it.

How many times did it show up? Just two. So, mode grade returns you a special kind of value called, or stats.mode returns you a value called a tuple. A kind of value called a tuple.

And we'll take a look at what that is in a moment.