Skip to main content
Colin Jaffe/2 min read

Data Frames: Concatenating Columns for Effective Splitting

ML Project Workflow

1

Define the Problem

What outcome are you predicting and why?

2

Prepare the Data

Clean, normalize, encode categoricals, split into train/test.

3

Train Models

Start simple — logistic regression baselines often surprise.

4

Evaluate & Iterate

Confusion matrix, ROC, F1 — pick metrics that match the problem.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Now, we have these columns for high, low, and medium. We want to CONCATENATE them to the end of our data frame so that we can later split the data frame into testing and training datasets.

Now, we have these columns for high, low, and medium. We want to CONCATENATE them to the end of our data frame so that we can later split the data frame into testing and training datasets. At that point, we’ll have all the necessary columns.

Here’s how we’ll do that. We’ll CONCATENATE them onto the right side, adding these new columns to each row. These three columns will be placed on the right side of our data frame.

We need to do a couple of things to achieve this. One step is to assign this concatenation result back to our data frame. We’ll say that our data frame is now the result of concatenating the old data frame with the new one.

The `CONCAT` function takes in a list of data frames. So, we’ll pass the old data frame and the new one (the salary one-hot encoding). Finally, we need to specify that the concatenation should be done by columns.

Otherwise, it will assume rows and place the high, low, and medium columns at the bottom of the data instead of the right side.

If we’ve done that, we can now check the HR data to see the result. We’ll still have all our previous columns, including salary, low, medium, and high, but we’ll exclude the original salary column.

Instead, we’ll include the high, low, and medium columns from the right side.

Now, our next step is splitting the data. Let’s proceed.