November 26, 2024 (Updated April 19, 2026)Colin Jaffe/4 min read

Scraping Book Titles and Prices from Multiple Web Pages Using Python

Multi-Page Scraping Workflow

Identify Pagination Pattern

Find the URL parameter that controls pages (?page=2).

Loop Pages

for page in range(1, max_pages+1): fetch and parse each.

Extract Data

BeautifulSoup selectors for titles and prices on each page.

Aggregate

Append all results to a list; convert to DataFrame at the end.

Master Data Science at Noble Desktop

Noble Desktop's Data Science & AI Certificate covers Python, machine learning, and the modern data science stack.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

For our grand finale, let's reset our titles and prices and loop through all possible pages. So the way we're going to do that is if we look at this page here, we're actually at books.

For our grand finale, let's reset our titles and prices and loop through all possible pages. So the way we're going to do that is if we look at this page here, we're actually at books.toscrape.com slash catalog slash page1.html. And if I go to next, well I'm at books.toscrape.com slash catalog slash page2.html. So what I want to do is loop through 1 to 50, our pagination max that we found in the last step. And for each one, go make a beautiful soup, scrape that new page, and add the titles and prices to our new list.

Here, let's make that happen. For page num, adding those other programming language parentheses, not in Python, please. For page number in the range 1 to pagination max, the thing that we got here in this step, 50 in our case.

And then we also need to add plus 1. And that's just because range is exclusive at the end. If we say 1 to 51, it’ll be numbers from 1 to 50. Okay, for every page number, what we want to do is make a URL.

I'm going to make an F string, meaning I'll be able to insert Python values into this string. I want to say it's—I’m going to copy and paste here. It's this, but this here is going to be some value.

That value is the page number right up here in the loop. First, it’ll be page 1, then page 2, page 3, all the way up to page 50. All right, so now that we’ve got that URL, we could say response = requests.get(URL).

And our soup is BeautifulSoup from the response.content and HTML parser. All right, we're off and running. This will make a request.

It'll make 50 separate requests. Now we just need to generalize what we did earlier. I'm going to say titles = titles +.

There are many different ways you could do a loop within a loop. You could do a list comprehension. That's what I'm going to do.

And you could do.extend instead of this plus. I'm going to do it this way. All right, titles = titles + title.find('A' tag).

Oh, for—that’s going to be for the H3s, but I actually haven’t queried the H3s yet. So this isn’t going to work yet. I’m going to say H3s = soup.find_all('H3').

Now, title.find('A') for every H3 in H3s. I find the A tag within each H3, and then I get its title attribute. Yep.

So find the A within the H3 and give me its title for every H3 in the H3 list. All right. And keep appending that to the end of our titles list.

Great. Let's do the same thing for prices. Prices might be a little tougher if we want to get them into numerical format, which we do.

I want price. I'm going to figure out what I'm going to do with it first, but first I'm going to say I want to loop through the prices. All right.

I, again, have not actually queried the prices. We'll say those prices are in P tags, if I remember. Let's double check.

We don’t have to leave it up to memory, which is good. I'm going to say price elements. I'd actually say price tags, but then it sounds like a different meaning.

Price elements = soup.find_all('P') tags where the class is 'price_color'.

So something for element in price elements. So for every element in price elements, now I want that element’s text. It's.get_text.

Element.get_text. Oh, but there’s going to be more because now I’ve got that, but I want to strip out the pound symbol. And I’m going to, again, need to go back and copy and paste that because I don’t have that handy.

Okay, strip that out, but then convert it to a float. So to read this more clearly, append to prices for every element in price elements.

Get that element's text, strip its pound symbol, and pass it through the float function to make it a number. That looks pretty good. When this loop is done, we should have updated prices and titles.

Let's run this. And it’s taking a while because it’s making 50 different requests to scrape. Let's see it in our data frame.

Let's put in a data frame and see if we can do it. Books is now a new DataFrame where the titles—or title—would actually be a better, more data-centric name for this. Title is our titles.

And price is our prices. Let's see it. All right.

And down here, our last five at the end. And we got 1,000 of them. 1,000 rows times two columns.

Title and price for all 1,000 items on the entire webpage. Whew!