I have had to write an HTML parser a few times already and I've always used HTML::Parser . Because it's simple and because I never bothered to do it anohter way. I always wanted to ideally do some parsing with XPath because that way I can easily find what I need with one or two queries. Unfortunately, XPath only works on properly formatted XML documents, so unless the page I am parsing is XHTML, I couldn't do much. I also wanted to somehow convert the HTML to properly formatted XHTML and use XPath with that. Unfortunately, I didn't know an easy way to do that either. Well, now I do. I can use the HTML::TreeBuilder module from HTML-Tree CPAN distribution. It is easy enough to use the module itself for parsing: my $tree = HTML::TreeBuilder->new_from_file("test.html"); my @img = $tree->look_down('tag', 'img', sub { $_[0]->attribute("src") =~ m!thumbnail!; } However, as I would rather use XPath, here is how to easily convert t