Skip to main content
April 2, 2026Colin Jaffe/3 min read

Scraping Book Titles and Prices from Multiple Web Pages Using Python

Master Python web scraping across multiple pages efficiently

Web Scraping Project Overview

50 pages
Pages to scrape
1,000 items
Total book records
2 fields
Data columns extracted
Understanding URL Patterns

The key to scraping multiple pages is recognizing URL patterns. In this case, pages follow the format 'books.toscrape.com/catalog/page{number}.html' where the number increments from 1 to 50.

Multi-Page Scraping Process

1

Reset Data Containers

Initialize empty lists for titles and prices to collect data from all pages

2

Loop Through Page Range

Use range(1, pagination_max + 1) to iterate through pages 1 to 50

3

Build Dynamic URLs

Create f-string URLs that insert the current page number into the URL pattern

4

Make HTTP Requests

Send GET requests to each page URL and parse the response with BeautifulSoup

5

Extract and Append Data

Find HTML elements containing titles and prices, then add to growing lists

Key Python Techniques Used

F-string URL Construction

Dynamic URL building using f-strings to insert page numbers. Essential for programmatic navigation across paginated content.

List Comprehensions

Efficient data extraction using list comprehensions within loops. Combines finding elements and extracting attributes in single expressions.

Data Type Conversion

Converting scraped price text to float numbers after removing currency symbols. Critical for numerical analysis of extracted data.

Range Function Exclusivity

Remember that Python's range() function is exclusive at the end. To scrape pages 1-50, use range(1, 51) or range(1, pagination_max + 1).

Multi-Page Scraping Approach

Pros
Captures complete dataset across all pages
Scalable to any number of pages
Maintains data consistency throughout process
Allows for comprehensive analysis of full dataset
Cons
Makes 50 separate HTTP requests
Takes longer to execute than single-page scraping
Higher risk of connection timeouts or failures
May trigger rate limiting on some websites

Data Extraction Success

Successfully Scraped95%
Pages Processed5%

Code Implementation Checklist

0/6

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

For our comprehensive finale, we'll reset our data structures and systematically loop through all available pages to capture the complete dataset. Examining the URL structure reveals a predictable pattern: we start at books.toscrape.com/catalog/page1.html, and subsequent pages follow the sequential format books.toscrape.com/catalog/page2.html, page3.html, and so forth. Our strategy involves iterating through pages 1 to 50 (our discovered pagination maximum), creating a Beautiful Soup object for each page, and methodically extracting titles and prices into our consolidated lists.

Let's implement this scalable solution. We'll use a for loop structure: `for page_num in range(1, pagination_max + 1)` where pagination_max represents our previously determined value of 50. This approach ensures we capture every available page without hard-coding limitations.

Notice the critical `+ 1` addition—this compensates for Python's range function being exclusive at the upper bound. When we specify range(1, 51), we get numbers 1 through 50, exactly what we need. For each iteration, we'll dynamically construct the target URL using Python's f-string formatting, allowing us to inject the current page number into our base URL template.

The URL construction follows this pattern: `f"https://books.toscrape.com/catalog/page{page_num}.html"` where page_num cycles from 1 to 50. This systematic approach ensures we don't miss any pages while maintaining clean, readable code. Once we have our target URL, we execute the familiar request-response cycle: `response = requests.get(url)` followed by `soup = BeautifulSoup(response.content, 'html.parser')`.

This implementation will generate 50 separate HTTP requests—a significant operation that requires patience as each request involves network latency and server processing time. In production environments, you'd want to implement rate limiting and error handling to maintain respectful scraping practices and handle potential connectivity issues.


Now we need to generalize our earlier extraction logic for bulk processing. For titles, we'll use list concatenation: `titles = titles + [title extraction logic]`. While there are multiple approaches—including list.extend() or list comprehensions—this method provides clarity and maintains our existing data structure.

The title extraction follows our established pattern: first, we locate all H3 elements with `h3s = soup.find_all('h3')`, then we extract the title attribute from each nested anchor tag using a list comprehension: `[h3.find('a')['title'] for h3 in h3s]`. This efficiently processes all titles on the current page in a single operation.

Price extraction requires additional processing since we need numerical values rather than raw text. We target paragraph tags with the 'price_color' class: `price_elements = soup.find_all('p', class_='price_color')`. Each price element contains text like "£51.77" that requires cleaning and conversion.

Our price processing pipeline involves three steps: extract the text content using `.get_text()`, remove the pound symbol with string slicing (`[1:]` to skip the first character), and convert to float for numerical operations. The complete operation looks like: `[float(element.get_text()[1:]) for element in price_elements]`. This transforms raw price strings into proper numerical data suitable for analysis and calculations.


When executed, this loop systematically processes all 50 pages, requiring several minutes to complete due to the sequential nature of HTTP requests. The result is comprehensive datasets containing every title and price from the entire catalog.

To verify our success, we'll construct a pandas DataFrame for immediate analysis: `books = pd.DataFrame({'title': titles, 'price': prices})`. This creates a structured dataset with 1,000 rows (representing every book) and two columns (title and price), providing a complete foundation for data analysis, visualization, and further processing. The DataFrame format enables powerful operations like sorting, filtering, statistical analysis, and export to various formats—transforming our web scraping effort into actionable business intelligence.

Key Takeaways

1Multi-page web scraping requires understanding URL patterns and implementing loops to iterate through paginated content systematically.
2Python's range() function is exclusive at the end, so scraping pages 1-50 requires range(1, 51) or range(1, pagination_max + 1).
3F-string formatting enables dynamic URL construction by inserting page numbers into URL templates during loop iterations.
4List comprehensions provide an efficient way to extract data from HTML elements, combining element finding and attribute extraction in single expressions.
5Data cleaning is essential when scraping prices - text must be stripped of currency symbols and converted to float for numerical analysis.
6Making multiple HTTP requests (50 in this case) takes significantly longer than single-page scraping but captures complete datasets.
7Successful execution of this approach yielded 1,000 book records with titles and prices from 50 pages of the target website.
8The final result can be structured into a pandas DataFrame for further analysis, with proper column naming for data-centric workflows.

RELATED ARTICLES