Skip to main content
Colin Jaffe/2 min read

Utilizing Pandas for Data Calculations and Predictions

Machine Learning Essentials

Supervised vs Unsupervised

Labeled data vs unlabeled — different problem classes.

Classification vs Regression

Predict a class label vs a continuous number.

Train/Test Split

Always evaluate on data the model never saw during training.

Hyperparameter Tuning

Grid search and cross-validation to find the best settings.

Master Machine Learning at Noble Desktop

Noble Desktop's Python Machine Learning Bootcamp covers scikit-learn, Keras, neural networks, and applied ML.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Use pandas to create a data frame and perform vector operations to calculate predictions from attendance and concessions data. Watch this tutorial to learn the key concepts and techniques.

Let's use a data frame to make more complex calculations here—to perform a vector operation on these values. So I'm going to call it `concessions_df`, and it will be the result of using pandas to create a data frame. We’re going to pass a dictionary to the DataFrame constructor with our values.

The first key will be `attendance`—that will be the name of our first column—and the value of that column will be the `attendance` list. Then we'll add another key called `concessions`, and the value of that will be our concessions Python list.

Now what we can do is perform those same vector operations, operating on every row in the column, by replacing this attendance list with `concessions_df['attendance']`. Let's see if this fixes our issue. Ah, that’s looking good.

There we are. Thanks, pandas. So this line is our best fit line.

And again, there are still some outliers, but if we are given a value like 27,000—or maybe 28,000—we could say, hey, it’s likely to be right around here, because this line should be fairly predictive. It would be even more accurate if we had more data. And we’ll do that later in this course.

We’ll have quite a lot of data. Let’s see how this prediction works.