Data Frames: Concatenating Columns for Effective Splitting
Master DataFrame Concatenation for Machine Learning Workflows
This article assumes you have already created one-hot encoded columns (high, low, medium) from a categorical salary column and are now ready to combine them with your original DataFrame.
DataFrame Concatenation Essentials
Column Addition
Concatenating adds new columns to the right side of your existing DataFrame. This preserves all original data while incorporating new features.
Training Preparation
The concatenated DataFrame becomes ready for train-test splitting. All necessary columns are now available in a single structure.
Data Integrity
Proper concatenation maintains row alignment and ensures each observation retains its complete feature set across old and new columns.
DataFrame Concatenation Process
Prepare DataFrames
Have your original DataFrame and the new one-hot encoded columns ready as separate DataFrames that need to be combined.
Use CONCAT Function
Apply the CONCAT function with a list containing both DataFrames - the original DataFrame and the salary one-hot encoding DataFrame.
Specify Column Axis
Set the concatenation to occur by columns to add new columns to the right side rather than appending rows at the bottom.
Assign Result
Assign the concatenation result back to your DataFrame variable to update it with the combined structure.
Row vs Column Concatenation
| Feature | Row Concatenation | Column Concatenation |
|---|---|---|
| Direction | Vertical (bottom) | Horizontal (right) |
| Result Position | New rows at bottom | New columns on right |
| Use Case | Adding more observations | Adding more features |
| For One-Hot Encoding | Incorrect approach | Correct approach |
Without specifying column concatenation, the function assumes row concatenation and will place your high, low, and medium columns at the bottom of the DataFrame instead of as new columns on the right side.
Post-Concatenation Verification
Verify that the number of columns has increased by the expected amount
Confirm that high, low, and medium columns appear in the DataFrame
Ensure the number of rows remains unchanged after column concatenation
Check that new columns align correctly with existing rows
Decide whether to keep or exclude the original salary column for training
This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.
Key Takeaways