Skip to main content
Colin Jaffe/2 min read

Website Data Scraping: Navigating Beyond the First Page

Data Science Foundations

Statistics

Hypothesis testing, distributions, sampling — the math behind decisions.

Programming

Python or R — pandas, numpy, scikit-learn.

Communication

Explain findings to non-technical stakeholders.

Domain Knowledge

Context separates analysis from insight.

Master Data Science at Noble Desktop

Noble Desktop's Data Science & AI Certificate covers Python, machine learning, and the modern data science stack.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

What if we wanted to get all the results, not just from the first page as discussed? We have this first page, and we want more.

What if we wanted to get all the results, not just from the first page as discussed? We have this first page, and we want more. And this is where, you know, some really amazing data scraping comes in. Because now we're talking about data scraping, but it's not just one page; it's the whole site.

It’s every single item in it. So, to do that, there's an element down here that says page 1 of 50. We need the text of this element.

And the reason we’ll need it is that we’re going to loop through and make a request each time. And when we do so, we’ll want to loop through 50 times. We need to know how many pages there are.

It might be different for different pages or searches. If we want to search for historical fiction, it might result in a different number of pages. We’ll want to hit every single page, and we won’t necessarily know how many pages there are.

So, we’re going to want to scrape and get just this one item here. This is the perfect use case for times when you just want one item. So, your next challenge is to get the text of that element, the one down here.

And as a bonus, get the last word of that text, which should be the actual number we need. Good luck, and we’ll take a look at how to solve that in just a moment.