Skip to main content
April 2, 2026Colin Jaffe/4 min read

Creating a DataFrame with Iris Dataset

Transform Raw Iris Data into Structured DataFrames

Iris Dataset Overview

150
Total Data Rows
4
Feature Columns
3
Flower Species

Dataset Components

Feature Names

Sepal Length, Sepal Width, Petal Length, and Petal Width serve as column identifiers for measurements.

Target Names

Three flower species - Setosa, Versicolor, and Virginica - represent the classification categories.

Creating the Initial DataFrame

1

Load Data

Extract the numerical data from iris_data.data containing all measurements

2

Set Column Names

Use iris_data.feature_names to create meaningful column headers

3

Create DataFrame

Combine data and column names using pd.DataFrame constructor

Understanding Target Values

The target array contains numerical codes (0, 1, 2) that correspond to flower species, but these numbers are not intuitive for analysis.

Target Value Mapping

Setosa
0
Versicolor
1
Virginica
2

Function vs Lambda Approach

FeatureNamed FunctionLambda Function
ReadabilityHigh - explicit and clearMedium - compact syntax
ReusabilityHigh - can be called elsewhereLow - inline only
Code LengthMultiple linesSingle line
DebuggingEasier to debugHarder to debug
Recommended: Use named functions for complex logic, lambdas for simple transformations

Creating Human-Readable Species Column

1

Define Mapping Function

Create get_flower_name function that takes target number and returns species name

2

Apply Transformation

Use pandas apply method to run function on every target value

3

Create Species Column

Save transformed values as new 'species' column in DataFrame

Pandas Apply Method

The apply method is powerful for element-wise transformations, allowing you to run custom functions across DataFrame columns efficiently.

Named Function vs Lambda

Pros
Named functions provide better code documentation
Easier to test and debug complex transformations
Can be reused across multiple DataFrame operations
More readable for team collaboration
Cons
Requires more lines of code for simple operations
May be overkill for basic transformations
Creates additional function definitions in namespace

DataFrame Enhancement Checklist

0/4
I'm definitely going to forget which one's zero, which one's one, and which one's two
This highlights why creating human-readable labels is crucial for data analysis - numerical codes are not intuitive for interpretation.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Let's transform our iris dataset into a structured DataFrame for proper analysis. We'll begin by examining the target classifications—Setosa, Versicolor, and Virginica—which represent the three iris species we're working with. The feature names provide our measurement dimensions: Sepal Length, Sepal Width, Petal Length, and Petal Width.

These features will serve as our column headers, creating a professional dataset structure. We'll create our DataFrame with `iris_dataframe = pd.DataFrame(data=iris_data.data, columns=iris_data.feature_names)`. This establishes our foundational data structure with proper column naming conventions.

The resulting DataFrame organizes our data into four distinct columns, each row containing the four measurements for a single iris specimen. We now have clearly labeled columns—Sepal Length, Sepal Width, Petal Length, Petal Width—across 150 observations. This structured approach is essential for any serious data analysis workflow.

Now we need to incorporate our target classifications. Currently, we can't distinguish which specimens are Setosa, Versicolor, or Virginica. Before adding our target column, let's examine `iris_data.target`, which contains an array of numerical encodings: zeros, ones, and twos.

These numerical values correspond directly to our species: Setosa (0), Versicolor (1), and Virginica (2). We'll add this classification data by creating a new column: `iris_dataframe['target'] = iris_data.target`. This gives us our target variable for machine learning applications.

While numerical encoding works perfectly for algorithms, it's problematic for human interpretation and data exploration. Remembering which number corresponds to which species creates unnecessary cognitive overhead and potential for errors in analysis and reporting.


The solution is creating a human-readable `species` column that translates these numerical codes into meaningful species names. We need to map each target value to its corresponding entry in `iris_data.target_names`.

The target_names array contains our species in order: ['setosa', 'versicolor', 'virginica'], indexed as 0, 1, 2 respectively. We'll use this mapping to convert numerical codes to descriptive labels. For each target value, we'll look up the corresponding species name using array indexing.

This transformation requires Pandas' powerful `apply` method, which executes a function across every row or column element. I'll demonstrate both approaches—using a named function and a lambda expression—so you can choose the style that best fits your coding preferences and team standards.

Let's start with a explicit function approach, which offers better readability and debugging capabilities. We'll create a function called `get_flower_name` that accepts a target number and returns the corresponding species name. The function uses `iris_data.target_names[target_number]` to perform the lookup—when `target_number` is 0, it returns 'setosa'; when it's 1, it returns 'versicolor'; and so forth.

Here's our implementation: the function captures the species name from the target_names array using the numerical index, stores it as `flower_name`, and returns the string value. This approach provides clear, maintainable code that's easy to debug and modify.


Now we apply this function across our entire target column: `iris_df['species'] = iris_df['target'].apply(get_flower_name)`. This creates our new species column by transforming every numerical target into its corresponding species name. Let's verify our results with `iris_df.sample(10)` to examine a random subset.

The transformation works perfectly—our function successfully converts target codes into readable species names. We can see 'Versicolor' for target 1, 'Setosa' for target 0, and 'Virginica' for target 2. This human-readable format dramatically improves data interpretability and reduces analysis errors.

For those comfortable with Python lambda expressions, we can achieve the same result more concisely. Instead of defining a separate function, we can use: `iris_df['species'] = iris_df['target'].apply(lambda target_number: iris_data.target_names[target_number])`. This one-liner performs identical functionality with reduced code footprint.

Both approaches yield identical results—the choice depends on your coding style, team preferences, and maintainability requirements. Named functions offer better debugging and documentation, while lambdas provide conciseness for simple transformations. Regardless of your chosen method, you now have a fully human-readable dataset ready for comprehensive analysis and modeling.

Key Takeaways

1The Iris dataset contains 150 rows with 4 numerical features measuring sepal and petal dimensions
2Target values are encoded as numbers (0, 1, 2) representing Setosa, Versicolor, and Virginica species respectively
3Creating a DataFrame requires combining iris_data.data with iris_data.feature_names for proper column structure
4The pandas apply method enables element-wise transformations using custom functions or lambda expressions
5Human-readable species names are essential for data interpretation and analysis workflows
6Named functions offer better readability and debugging capabilities compared to lambda functions for complex operations
7The iris_data.target_names array provides the mapping from numerical codes to actual flower species names
8Adding both numerical target and textual species columns provides flexibility for different analysis approaches

RELATED ARTICLES