Preparing Titanic Dataset: Splitting and Scaling Techniques
Data Prep Pipeline
Handle Missing Values
Impute Age with median, drop or fill Cabin, encode Embarked.
Encode Categoricals
One-hot encode Sex and Embarked. Use pd.get_dummies().
Train/Test Split
from sklearn.model_selection import train_test_split — 80/20 typical.
Scale Features
StandardScaler fits on train, transforms both train and test.
Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.
This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.
Next, we're going to split our data. We don't actually need to split it quite as much as we normally do.