Danny Ayers posted a nice idea for an application in the Greasemonkey/browser-enhancement style: a proxy that scrubs a page to well-formed markup, then applies a series of XSLT transforms to it.
When I was 2Roam, we had that application. It was called Catalyst. It executed the JavaScript on a page or frameset and cleaned up the results. You got a compound document out the other side with all the HTML, and all the information from the HTTP headers and cookies.
