Using Boolean Conditions to Filter Data in Python
Master Python Data Filtering with Boolean Conditions
This tutorial covers filtering data frames using Boolean conditions, adding and modifying rows, and extracting statistical insights from your data.
Core Concepts
Boolean Filtering
Use conditions like 'calories <= 500' to filter data frames. Only rows that return True are included in the result.
Row Operations
Add new rows using df.loc[index] or modify existing data. Use len(df) for dynamic positioning.
Statistical Analysis
The describe() method provides comprehensive statistics including mean, standard deviation, and percentiles for numeric columns.
Boolean Filtering Process
Create New DataFrame
Start with max_500_cals_df = food_df to create a filtered copy
Apply Boolean Condition
Use square brackets with condition: food_df[food_df['calories'] <= 500]
Row-by-Row Evaluation
The filter iterates through each row, testing the condition and keeping only True results
Generate Result
All qualifying rows are accumulated into the new data frame
Filtering Examples from Tutorial
| Feature | Condition | Result |
|---|---|---|
| calories <= 500 | Pizza and steak only | 2 out of 4 items |
| vegan == False | Non-vegan items | 3 out of 4 items (excludes garden salad) |
| price >= 15 | Minimum $15 items | Premium priced items only |
Use len(df) instead of hard-coding row numbers when adding data. This automatically places new rows at the end, even as your data frame grows.
Adding New Rows
Prepare Data
Create a list with values matching your columns: [name, price, calories, vegan]
Use Dynamic Positioning
Set location with df.loc[len(df)] to automatically use the next available index
Assign Values
Set the location equal to your list of values to populate all columns at once
Row Modification Approaches
Statistical Measures Explained
With a mean of 637 calories and standard deviation of 250, roughly 68% of food items fall between 387 and 887 calories (one standard deviation from the mean).
Data Frame Evolution in Tutorial
Initial Setup
Started with 4 food items (pizza, steak, hamburger, garden salad)
First Addition
Added fruit salad at index 4 using df.loc
Price Manipulation
Doubled all prices, then cut them in half to return to normal
Bulk Addition
Added 3 items using a loop (chicken salad, chef salad, big kahuna)
Name Changes
Modified chef salad to house salad, then to shrimp salad
Best Practices for Data Frame Operations
Prevents index conflicts as your data grows
Verify your logic before applying to large data
Column names are more readable and maintainable than indices
Statistical summaries reveal data patterns and outliers
Preserves original data while creating focused subsets
This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.
Key Takeaways