Web scraping in C# using HtmlAgilityPack

Tai Bo
4 min readOct 16, 2022

In this post, I show an example of scraping data in C# using HtmlAgilityPack. I come across HtmlAgilityPack because I need to get data from Zillow to analyze properties deals. I was able to scrape the data I want without much trouble using HtmlAgilityPack with a bit of XPath, LINQ and regular expression.

Below I show a screenshot of a sample Zillow listing page which contains the data I want to scrape.

I want to scrape the data under Facts and Features. The process is simple using .NET HttpClient and HtmlAgilityPack. First, I stream the HTML content. Then, I use HtmlAgilityPack to parse the document and extract the data using XPATH.

Stream HTML

It is quite easy to stream the HTML of a Zillow listing page using .NET HttpClient, as shown in the below code snippet.

public class ZillowClient : IZillowClient
{
private HttpClient _httpClient;

public ZillowClient(HttpClient httpClient)
{
_httpClient = httpClient;
}

public Task<string> GetHtml(string address)
{
return _httpClient.GetStringAsync(BuildUrl(ZillowUtil.NormalizeAddress(address)));
}

private string BuildUrl(string address)
{
return @$"https://www.zillow.com/homes/{address}";
}
}

In the above codes, I use GetStringAsync method to download the html content into a string in one line. Below shows an example of a URL which Zillow understands: https://www.zillow.com/homes/7777-Alder-Ave-Fontana-CA-92336. I replace the spaces and punctuation marks in the address with hyphen, as shown in the below code snippet:

public static string NormalizeAddress(string address)
{
return Regex.Replace(address.Replace(",", " "), @"\s+", " ").Replace(" ", "-");
}

Parsing data using HTMLAgilityPack and XPATH

Once I have the HTML content, I load it into a HTMLDocument object and extract the data into a model I can use.

public ListingDetail Parse(string html)
{
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

var listingDetail = new ListingDetail()…
Tai Bo

Backend developer in .NET core. I enjoy the outdoor, hanging out with good friends, reading and personal development.