Skip to main content
Colin Jaffe/1 min read

Analyzing Titanic Survival: Impact of Gender and Embarkation Port

ML Project Workflow

1

Define the Problem

What outcome are you predicting and why?

2

Prepare the Data

Clean, normalize, encode categoricals, split into train/test.

3

Train Models

Start simple — logistic regression baselines often surprise.

4

Evaluate & Iterate

Confusion matrix, ROC, F1 — pick metrics that match the problem.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Analyze Titanic survival rates by gender and port of embarkation to identify influential factors. Watch this tutorial to learn the key concepts and techniques.

Let's take a look at two more bits of data, two more features that we think might be important. We'll plot our survival by gender. The way we're going to do that is we're going to say xs = sns.countplot where our X is survived, our hue is sex, and our data is still Titanic data.

And here we have Titanic survival by gender. Men perished at a higher rate. Women survived at a higher rate overall.

So this also seems like it could be an important one. Finally, we're going to look at the port of embarkation, meaning which port they boarded the Titanic at. We'll do the same kind of thing.

We'll create our countplot where X is survived, hue is embarked, and our data is still Titanic data. Let's see what that looks like. So of the three ports (S, Q, and C), passengers from S perished at about two-thirds, Q had a 60-70% death rate, and C had a higher survival rate than death rate.

This also seems like it could be an important factor. We'll look at how to combine this stuff even further in the next video.