TagSoup

John Cowan mentions Tag Soup which “parses HTML as it is found in the wild: nasty and brutish, though quite often far from short.”

It doesn’t fix HTML, but returns a SAX stream of properly nested elements and attributes you can catch and process.

More like this: , .