Skip to main content
Colin Jaffe/1 min read

Accurate Data Column Order for Predictive Modeling

Match Column Order at Predict Time

Most sklearn models track features by position, not name. If your prediction-time DataFrame has columns in a different order than training, predictions become silently wrong. Always reorder predict-time inputs to match the training feature list, or use a Pipeline that handles it for you.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Quick note on an error I made in the earlier version of this. I had 'Age' second here instead of, I think, fourth.

Quick note on an error I made in the earlier version of this. I had 'Age' second here instead of, I think, fourth. And that matters because the model doesn't know what any of these columns represent.

It doesn't know what their names are or what they represent. So, the order is all that it knows. It knows that column four appeared to be a pretty good predictor, and that was Age.

So, here I've restored it to the original version that I'm about to correct. You should move 'Age' to fourth if you're following along with the earlier video. Then, to make sure we're keeping this all straight, I'm going to run all the previous code.

We'll start off in the correct place. And then I'm going to run this. Great.

Now I'm going to run the next line, which I was starting to run before.