Web scraping in C# using HtmlAgilityPack

Tai Bo
4 min readOct 16, 2022

In this post, I show an example of scraping data in C# using HtmlAgilityPack. I come across HtmlAgilityPack because I need to get data from Zillow to analyze properties deals. I was able to scrape the data I want without much trouble using HtmlAgilityPack with a bit of XPath, LINQ and regular expression.

Below I show a screenshot of a sample Zillow listing page which contains the data I want to scrape.

I want to scrape the data under Facts and Features. The process is simple using .NET HttpClient and HtmlAgilityPack. First, I stream the HTML content. Then, I use HtmlAgilityPack to parse the document and extract the data using XPATH.

Stream HTML

It is quite easy to stream the HTML of a Zillow listing page using .NET HttpClient, as shown in the below code snippet.

public class ZillowClient : IZillowClient
{
private HttpClient _httpClient;

public ZillowClient(HttpClient httpClient)
{
_httpClient = httpClient;
}

public Task<string> GetHtml(string address)
{
return _httpClient.GetStringAsync(BuildUrl(ZillowUtil.NormalizeAddress(address)));
}

private string BuildUrl(string address)
{
return @$"https://www.zillow.com/homes/{address}";
}
}

--

--

Tai Bo
Tai Bo

Written by Tai Bo

Backend developer in .NET core. I enjoy the outdoor, hanging out with good friends, reading and personal development.

No responses yet