Skip to main content
April 2, 2026Colin Jaffe/3 min read

Analyzing Titanic Data: Combining Class and Gender for Insights

Advanced pandas techniques for meaningful data exploration

Understanding Composite Variables

Creating composite variables like p-class sex allows analysts to examine the interaction between multiple categorical variables simultaneously, revealing patterns that might be hidden when examining variables in isolation.

Titanic Dataset Variables

Passenger Class

Numeric values 1, 2, or 3 representing first, second, and third class accommodations respectively. This socioeconomic indicator directly influenced survival rates.

Gender (Sex)

String values indicating passenger gender. Historical maritime protocol prioritized women during evacuations, making this a critical survival factor.

Survival Status

Binary outcome variable indicating whether passengers survived the disaster. This serves as our target variable for predictive modeling.

Creating Composite Variables in Pandas

1

Define Category Values

Establish all possible combinations beforehand to ensure consistency and proper ordering in your categorical variable.

2

Concatenate Columns

Use string concatenation with separators, ensuring data type compatibility by converting numeric values to strings using astype(str).

3

Create Categorical Type

Convert the new column to pandas categorical type with predefined categories for better memory efficiency and ordered operations.

Survival Rates by Class and Gender

First Class Female
91
Second Class Female
70
Third Class Female
72
First Class Male
45
Second Class Male
15
Third Class Male
10
Class Advantage Breakdown

The data reveals that by third class, gender advantages were significantly diminished. Third class females had equal numbers of survivors and casualties (72 each), showing how socioeconomic factors could override traditional maritime protocols.

Gender vs Class Impact Analysis

FeatureFemale PassengersMale Passengers
First Class Survival96.8% survived62.9% survived
Second Class Survival92.1% survived15.7% survived
Third Class Survival50.0% survived13.5% survived
Recommended: Gender provided significant survival advantage across all classes, but class level determined the baseline survival probability.

Key Survival Statistics

3 total
First class females perished
91 total
First class females survived
6 total
Second class females perished
72 total
Third class females survived
Third class male did very poorly. Barely any of them survived.
This stark observation highlights how the intersection of gender and socioeconomic status created dramatically different survival outcomes, with third class males facing the worst odds of survival.

Data Preparation for Machine Learning

0/4

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

We'll now perform advanced pandas DataFrame manipulation to create a composite feature called "p-class sex"—a powerful analytical construct that combines passenger class (first, second, or third) with gender. This feature engineering technique allows us to examine the intersection of socioeconomic status and gender in survival patterns, revealing insights that neither variable could provide independently.

Begin by defining our categorical framework with six distinct values: first class female, first class male, second class female, second class male, third class female, and third class male. These represent all possible combinations of our two variables and will serve as our controlled categories for analysis.

Next, we'll construct this composite column by concatenating the p-class and sex values. Execute this with: titanic_data['p_class_sex'] = titanic_data['pclass'].astype(str) + '_' + titanic_data['sex']. The critical step here is converting the numeric p-class values (1, 2, 3) to strings using astype(str), enabling seamless concatenation with the already-string sex variable. This type conversion prevents pandas from attempting arithmetic operations and ensures proper string joining.

The final transformation converts our new column into a pandas categorical data type—a best practice that improves both memory efficiency and analytical precision. Execute: titanic_data['p_class_sex'] = pd.Categorical(titanic_data['p_class_sex'], categories=categories_list). Categorical data types also enable ordered operations and ensure consistent behavior across different analytical operations.

Examining our newly created series reveals the expected combinations: third class male, first class female, third class female, and others distributed across our 891 total observations. This structured approach to feature engineering demonstrates how thoughtful data preparation can unlock deeper analytical insights.

With our composite feature ready, we can now visualize these intersectional survival patterns using Seaborn's sophisticated plotting capabilities. The count plot with survival status on the x-axis and our p-class sex feature as the hue parameter reveals stark disparities that individual variables alone couldn't illuminate.

The visualization exposes dramatic survival inequalities across our six categories. Third-class males experienced devastating mortality rates—barely any survived the disaster. Second-class males fared similarly poorly, highlighting how gender and class intersected fatally for men in lower passenger classes. The contrast with female passengers is striking: first-class women achieved remarkable survival rates with only three fatalities against 91 survivors. Second-class females also demonstrated strong survival advantage with just six deaths versus 70 survivors.

Most revealing is the third-class data, where gender advantage finally diminishes—72 deaths and 72 survivors represent near-parity. This suggests that by third class, socioeconomic disadvantage began overwhelming gender-based survival privileges. The "women and children first" protocol apparently held strongest among higher passenger classes, where social status reinforced traditional evacuation priorities.

This intersectional analysis demonstrates the power of composite features in revealing complex relationships within historical data. As we transition to machine learning applications, this engineered feature will likely prove highly predictive—capturing nuanced survival patterns that simpler models might miss. Our next phase involves preparing this enriched dataset for algorithmic analysis, beginning with a random forest classifier that can leverage these multi-dimensional insights for sophisticated predictive modeling.

Key Takeaways

1Composite variables created by combining passenger class and gender reveal interaction effects between socioeconomic status and demographic factors
2Data type conversion using astype(str) is essential when concatenating numeric and string columns in pandas
3Categorical data types in pandas provide memory efficiency and enable ordered operations for better analysis
4First class females had the highest survival rate with only 3 deaths compared to 91 survivors
5Third class passengers showed the most equality between genders, with third class females having a 50-50 survival rate
6Male passengers across all classes had significantly lower survival rates, with third class males faring the worst
7Seaborn count plots effectively visualize survival patterns across multiple categorical variables simultaneously
8Proper data preparation including categorical encoding is crucial before implementing machine learning models like random forest classifiers

RELATED ARTICLES