Web Scraping Part 2: XPath
XPath for Web Scraping
What XPath Is
XML Path Language — query language for navigating HTML/XML document trees.
Install XPath Helper
Chrome extension that helps you build and test XPath expressions live on a page.
Location Paths
html/head/title navigates from root to title element — like folder paths on a computer.
Use with lxml or Scrapy
Python's lxml.html.fromstring(html).xpath('//div[@class="X"]') extracts data quickly.
Noble Desktop's Python Programming Immersive covers AI APIs, data analysis, and modern Python development.
In Part I of the web scraping series, we covered the basics of HTML nodes, syntax, and Beautiful Soup to scrape a website called DataTau to collect data science article titles. In this article, we will cover another useful web scraping tool called XPath Helper. However, to learn about this tool, we first have to learn what an XPath is.


First thing you have to do is right click anywhere on the website and choose “inspector”. This will bring a window below that has this page’s HTML document. Next, you want to click on an icon that looks like a mouse pointer hovering over a square on the top left corner of your inspector window. Once you’ve clicked on it, click on the title of an apartment post. This will highlight the section of the HTML we will want to scrape.
Let’s break down the query. ‘//’ means we’re going to search the whole document for a p element that offers a class called “result-info”. Once it’s been found, we’re going to navigate to its child element by typing a forward slash. Next, we notice the child element is an a element that offers a class. We can simply reference that class by typing a[@class]. This will grab all the titles in this page. Pretty simple and powerful right? You can copy and paste the results into an Excel.