Keith Dawson outlines some techniques for working through large amounts of content, and some regular expressions you can use to transform news site URLs into their printer friendly versions (which have much less fat and ads.)
I have to wonder if one of the reasons we aren’t seeing style sheets take hold is that by intermingling content and structure, that sites beholden to advertisers make it that much more difficult to do simple filtering to remove ads and other noise.
At WHUMP dot COM, if you don’t want the pretty colors, just turn off stylesheets. No need to mess with HTML:Parser in your proxy server.
Possibly Related posts (machine generated):