Refining Data: Removing Outliers for Improved Model Training
Enhance Model Performance Through Strategic Data Filtering
Outliers can significantly skew machine learning models, leading to poor generalization and inaccurate predictions on new data.
Dataset Overview
Outlier Removal Process
Filter by Price Threshold
Remove cars with price greater than 80 thousand, eliminating 2 high-priced outliers from the dataset
Filter by Engine Size
Remove cars with engine size greater than 7, eliminating 1 additional outlier with unusually large engine
Redeclare Variables
Use filtered data to recreate X and Y variables for improved model training
Data Reduction Through Filtering
Before vs After Outlier Removal
| Feature | Before Filtering | After Filtering |
|---|---|---|
| Total Rows | 153 | 150 |
| Max Price Threshold | No limit | ≤ 80k |
| Max Engine Size | No limit | ≤ 7.0 |
| Data Quality | Contains outliers | Normalized range |
Next Steps for Model Training
Ensure feature and target variables reflect the cleaned dataset
Maintain proper data separation for unbiased evaluation
Allow the model to learn from normalized data distribution
Evaluate improvements in accuracy and generalization
This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.
Key Takeaways