January 13, 2025 (Updated April 19, 2026)Colin Jaffe/5 min read

Creating a DataFrame with Iris Dataset

Iris DataFrame Build

Import sklearn

from sklearn.datasets import load_iris.

Load Data

iris = load_iris() — built-in dataset bundled with scikit-learn.

Build DataFrame

df = pd.DataFrame(iris.data, columns=iris.feature_names).

Add Target

df['species'] = iris.target. Now you have a complete labeled DataFrame.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Convert the Iris dataset into a pandas DataFrame, map numerical targets to species names, and add this as a new column. Watch this tutorial to learn the key concepts and techniques.

Let's start making this into a DataFrame that we can work with. We can first take a look at one other bit of target names, and that's Setosa, Versicolor, Virginica. Now we'll use those. We can also look at feature names: Sepal Length, Sepal Width, Petal Length, and Petal Width.

So those are going to be our column names so that we can make this into a proper dataset. So let's do that. We're going to say `iris_dataframe`, give me a Pandas DataFrame, where the data is `iris_data.data`, and the column names are `iris_data.feature_names`. And then we can look at our DataFrame.

All right, it's split it up into these four columns, each one having, remember, a row, an array of four items. And we've got our column names: Sepal Length, Sepal Width, Petal Length, Petal Width. And there are 150 total rows.

Next, we'll take a look at adding target. We don't know which of these flowers is Setosa, Versicolor, or Virginica. So before we actually add a target as a column—that's our goal here—we can look at `iris_data.target`. And that's an array of zeros, ones, and twos.

And again, that's Setosa, Versicolor, and Virginica. If that's our target names, we can now get them onto our data. We can say `iris_dataframe`, add a new column `target`, and it equals `iris_data.target`. And now let's look at our `iris_dataframe`.

You can see now it has a target going from zero and then, at the tail end, all twos. All right, now this is all well and good, but I'm definitely going to forget which one's zero, which one's one, and which one's two. I don't know about you, but I'm definitely not going to keep that in mind.

In fact, I don't know it now. We're going to make a `species` column. And first, to do so, we're going to need to go over this `target` column.

And for each value, translate it from these numbers—zero, one, or two—to flower names. Now the flower names are in `iris_data.target_names`, and that's an array: Setosa, Versicolor, Virginica—zero, one, two. We can look at the index in `iris_data.target_names` to get the flower species.

So for every one that's zero, we'll look at that target index zero in target names. If it's one, we'll look at that index in the target names.

And if it's two, we'll look at that index in the target names. So to do that, we need to use Pandas' `apply` method. The `apply` method takes in a function.

Now we'll do this both with a named function and with a lambda so you can see the different ways to do it. I prefer to start with a regular Python function.

And this is one that takes in a target number and returns a flower name that the target number maps to. So I'm going to make a `flower_name` variable within this function that will be in the iris data in the target names—the one at that target number. So again, if `target_number` is zero, this will be the array `target_names[0]`—Setosa, Versicolor, Virginica—at index zero.

If `target_number` is one, then it will be target names at index one, and so on. And we'll save that string as `flower_name` and return it. All right, so now what we can do is use that function and give it to Pandas to run on every target.

Right, so the first target, first row—it'll run it on the target number and give us back that flower name and make that the value for `iris_df['species']`. So it's `iris_df['target']`, but applying our `get_flower_name` function. And now let's, you know, double-check a couple of these by—actually, let's do `iris_df.sample(10)` to get 10 random flowers.

We're defining our `get_flower_name` function. We're saying apply that flower name function to every target value and save that as the species value. Let's try that.

There are some random ones. It applied that function to the target and got Versicolor. It applied that function to this one and got Setosa, and so on.

And here are some twos. Now we have a very human-readable species. If you want to, you know, try that with a lambda, we could have skipped defining this function to begin with and just done this:

Again, this is if you're pretty comfortable with your Python lambdas, then this is a good way to do it. We could have done—instead of this line, and instead of this function—we could have just done this all in one line: `iris_df['species'] = iris_df['target'].apply(lambda target_number: iris_data.target_names[target_number])`.

Again, this does the same thing as the function up here. It just does it quicker and in one line. If we run that—while I’ve made a typo at the start—try running it again. There we go.

Same result. It just depends on which style you prefer. But either way, we now have a human-readable set of species.