Skip to main content
April 2, 2026Colin Jaffe/3 min read

Data Frames: Concatenating Columns for Effective Splitting

Master DataFrame Concatenation for Machine Learning Workflows

One-Hot Encoding Context

This article assumes you have already created one-hot encoded columns (high, low, medium) from a categorical salary column and are now ready to combine them with your original DataFrame.

DataFrame Concatenation Essentials

Column Addition

Concatenating adds new columns to the right side of your existing DataFrame. This preserves all original data while incorporating new features.

Training Preparation

The concatenated DataFrame becomes ready for train-test splitting. All necessary columns are now available in a single structure.

Data Integrity

Proper concatenation maintains row alignment and ensures each observation retains its complete feature set across old and new columns.

DataFrame Concatenation Process

1

Prepare DataFrames

Have your original DataFrame and the new one-hot encoded columns ready as separate DataFrames that need to be combined.

2

Use CONCAT Function

Apply the CONCAT function with a list containing both DataFrames - the original DataFrame and the salary one-hot encoding DataFrame.

3

Specify Column Axis

Set the concatenation to occur by columns to add new columns to the right side rather than appending rows at the bottom.

4

Assign Result

Assign the concatenation result back to your DataFrame variable to update it with the combined structure.

Row vs Column Concatenation

FeatureRow ConcatenationColumn Concatenation
DirectionVertical (bottom)Horizontal (right)
Result PositionNew rows at bottomNew columns on right
Use CaseAdding more observationsAdding more features
For One-Hot EncodingIncorrect approachCorrect approach
Recommended: Always specify column concatenation when adding one-hot encoded features to avoid placing new columns at the bottom instead of the right side.
Axis Parameter Critical

Without specifying column concatenation, the function assumes row concatenation and will place your high, low, and medium columns at the bottom of the DataFrame instead of as new columns on the right side.

Post-Concatenation Verification

0/5

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Now that we have successfully created our high, low, and medium salary columns through one-hot encoding, our next critical step is to concatenate these transformed features to our existing data frame. This consolidation ensures we have a complete dataset ready for the essential train-test split that follows—a fundamental requirement for any robust machine learning pipeline.

The concatenation process involves appending these three new binary columns to the right side of our current data frame structure. This horizontal alignment preserves our existing row relationships while expanding our feature set with the newly encoded categorical variables. Each row will now contain both the original features and the corresponding one-hot encoded salary indicators.

To execute this concatenation properly, we must reassign the result back to our original data frame variable. This assignment pattern—where we update our data frame with the concatenated result—is a standard practice in data preprocessing workflows. The operation essentially replaces our current data frame with an enhanced version that includes all previous columns plus our new encoded features.

The `CONCAT` function operates on a list of data frames as its primary input parameter. In our case, we'll pass two arguments: our original data frame and the newly created one-hot encoded salary data frame. Crucially, we must specify the concatenation axis as columns (typically `axis=1` in most frameworks) rather than the default row-wise concatenation. This parameter specification is essential—without it, the function defaults to vertical concatenation, which would incorrectly place our high, low, and medium columns beneath the existing data rather than alongside it as additional features.

This axis specification prevents a common preprocessing error that can corrupt your dataset structure and lead to training failures downstream.

Upon successful completion of this concatenation, examining our HR dataset reveals the expected transformation. We retain all original columns while gaining our three new binary salary indicators: high, low, and medium. At this stage, we typically remove the original categorical salary column since it's now redundant—the one-hot encoded columns contain the same information in a format optimized for machine learning algorithms. This cleanup step reduces dimensionality and eliminates potential multicollinearity issues.

The right-side placement of these encoded columns provides a clean, logical structure that clearly delineates our original features from our engineered ones—a practice that aids in model interpretation and debugging.

With our feature engineering complete and our dataset properly structured, we're now positioned to tackle the next crucial phase: partitioning our data into training and testing subsets. This split will enable us to build a model that can generalize effectively to unseen data.

Key Takeaways

1DataFrame concatenation adds new columns horizontally to the right side of existing data structures
2The CONCAT function requires a list of DataFrames as input to combine multiple data sources
3Column axis specification is essential to prevent incorrect row-wise concatenation of new features
4Assignment back to the original DataFrame variable updates the structure with concatenated results
5One-hot encoded columns become additional features that enhance the DataFrame for machine learning
6Proper concatenation maintains row alignment and preserves data integrity across all columns
7The concatenated DataFrame provides all necessary columns for subsequent train-test splitting
8Original categorical columns can be excluded in favor of their one-hot encoded representations

RELATED ARTICLES