April 21, 2025 (Updated April 19, 2026)Colin Jaffe/7 min read

Web Scraping: Extracting Non-Truncated Titles and Prices with Python

Web Scraping Workflow

Inspect HTML

Browser DevTools to find selectors for titles and prices.

Use requests + BeautifulSoup

Fetch page; parse HTML to extract elements.

Handle Truncation

Some sites truncate text — fetch detail page for full title.

Save to CSV

Pandas DataFrame to CSV for further analysis.

Master Data Science at Noble Desktop

Noble Desktop's Data Science & AI Certificate covers Python, machine learning, and the modern data science stack.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Scrape book titles and prices from a webpage using requests and BeautifulSoup. Watch this tutorial to learn the key concepts and techniques.

Let's take a look at some ways we might have solved this problem. We're going to make a response and set it equal to doing request.get of that URL. And we should check is the status code not 200.

Error, book's not available to be scraped. Seems like a fine thing to say. Okay, let's get the titles of the page.

Now if we look at the title here, and we do an inspect, let's see, it's text in the A tag. We need to find an A tag in an H3 and get its value. So first we need to look for all H3s.

Then we can look for A tags that are in there. So to do that, let's get all H tags, H3 tags, or H3s, right? Pretty sure, yep, yep, H3s. We'll say title tags, maybe.

Okay, soup, oop, we haven't created soup. Soup equals beautiful soup from that response.content as an HTML parser. All right, title tags equals soup.find all H3s.

Then now that we've found all H3s, we can get for every H3 the name, the text of an A inside it. I'm going to do it with a loop. Could potentially do this with a list comprehension, but I'm going to go loop.

For every tag in title tags, I'm going to make a, let's make a titles list and append to it. Well, first I guess we'll find, I believe every H3 just has one in it. We'll say tag.find A, that thing's title attribute.

No, actually just that thing's text. Getting ahead of myself. Get text method.

Find an A tag, get its text, and I want to append that to titles. I'm feeling like this maybe should actually, I should go back to this as a list comprehension. Let's try it as a list comprehension, see if it's comprehensible.

Titles equals a list where for every tag in title tags, I want the tags find A, get text. I think that's pretty readable. Titles is a list where for every tag in the title tags, find the A inside it and get its text.

Yeah, I find that pretty readable. Feel free to do a loop version. If you want to do a loop version, that's how you do it.

But I'm going to prefer the list comprehension as long as it's reasonably readable. I think even a little longer than this, and I'd say that's not readable. All right, now prices, let's take a look at where prices are here.

Prices are right here. Let me inspect that. All right, it's a P, the class of price color, and I want the text inside it.

Okay, well, we can get that using a query, and we'll query by the attribute. So I'm going to do another list comprehension, and for every paragraph in, and paragraph is just the HTML P tag. These prices are wrapped in P tags.

That P is not for price, it is for paragraph. So for every paragraph in soup.findall paragraphs with the attribute class, and you put that in quotes, which I will in a moment, of price color. Is that right? Price underscore color.

Yep, then this should be in quotes. All right, so for every P in soup.findall, I just want the P's get text. There we go.

Again, now the, you know, it's a little more complicated on the right, and pretty simple on the left. Here we already made a title tags previously. We didn't try to do this all in one line, so that's pretty short, and we made it a little longer.

We did it here right in the list comprehension. Feel free to mix it up and do it however you'd like, whatever is most readable to you. As always, readable is in the eye of the beholder.

Now the bonuses. The non-truncated version of the title. Well, let's take a look at these.

Let's print titles, and let's print prices. So the titles are good, but they're pretty truncated. What does that mean? They go past a certain length.

They have a dot, dot, dot, and the prices are actually strings with the little pound symbol in. We probably don't want that. So that's what our bonuses are about, putting them in the right format.

Let's get the non-truncated version of the title, and the hint here is if we look at a title, there's actually a longer version, like a light in the dot, dot, dot is the dot text of it, but each one has a title attribute with the longer version of the title. So that's pretty easy to fix. Instead of this, a find the A and get its text, let's find the A and get its title value.

We're rerunning that again. Yep, a light in the attic, we get the full version. Again, all we had to do was change the dot, get text.

Don't get the text that's inside that HTML element. Instead, give me the title attribute value for it. Title equals a light in the attic, whereas the text in it is a light in the dot, dot, dot.

Okay, so that solves that one. The prices is slightly harder, but you know, not so much harder. What we want is to convert them to decimal point numbers, floats instead of strings, and then we want it also to get rid of the pound sign.

We need to remove the pound sign first, because we can't convert a string to a number if it's got weird symbols in it, non-numerical symbols. We'll say, okay, actually, prices equals prices.strip. Nope, that's not right. So.strip is a great method, but it's on each individual string, which means we're going to need to do a list comprehension.

For every price in prices, I'm going to want to say price.strip, that pound symbol, and I don't know where on my keyboard the pound symbol is, but that's okay, because guess what? It's right here in these prices, and I can copy and paste, or I can give it a try anyway. Yeah, I think it's going to work. There we go.

Let's try printing this out again. It should print out without the pound symbols. There we go.

Strip them out. Now we can convert those to floating point numbers, and we can do it right in here. We could say I want you to round.

Nope, we're going to say I want to convert it to a float, possibly rounding it. We'll make it into a float first. Strip out the pound symbol, and then run it through the float function, and there we go.

We might have some issues where we've got it 22.6 when we want it to be 22.60. We could certainly play around with that, but I think we've solved the problem we were asked to solve. Maybe if you want to stretch it and print it out as $50.10 or pounds and 10 cents, but then I think you're getting back to the string we were at before. If we want them to be numerical, this is the way to do it, and this might be how we want it if we want to put it into a data frame.

Full titles and numerical values that we can look at. Let's do that in the next step.