The State of Screen Scraping

During all this recent excitement about using hAtom to generate feeds, I’d forgotten that I wrote about the concept nearly three years ago when I was getting ready to talk about syndication at Seybold SF.
While providing a purpose-built (ahem, Atom) feed remains the best answer, hAtom offers two things we didn’t have three years ago:

A [...]

XSLT Friday

A couple of weeks ago, Elliot Rusty Harold asked why don’t we embed XML vocabularies in XHTML instead of using microformats. He gave the example of his conference calendar. The events are in a custom XML format embedded in XHTML, and uses an XSLT processing instruction to convert it all to XHTML for display in [...]

XML Nanny 2.0

Todd’s released a new version of XML Nanny, his parsing and validation tool for Mac OS X. This new version supports an exhaustive list of validators: DTD, XML Schema, Relax NG (XML and simplified syntax), and Schematron.

But what about the kittens?

Bruce Eckels: Every time someone creates a new XML-based language, God (performs some unspeakable act).

It’s the Pipes

Strata Rose Chalup: The Unix pipe culture of sending one program’s output to the next program’s input. I think this explains why Web 2.x is taking off like gangbusters, particularly the mashups. It’s XML and regularized schema.

Subscribing To hAtom Feeds With NetNewsWire

hAtom’s not useful until you have a way to get from a blog’s summary page to an Atom feed. Chris Casciano wrote an AppleScript wrapping xsltproc that reads an hAtom page, applies the hAtom2Atom.xsl transform, and hands the result back to NetNewsWire.
Note: Scott Reynen points out that the script doesn’t work with NNW 2.1 beta.

E4X: Mac OS X Setup

E4X, mentioned earlier, is an XML processing extension for JavaScript. It’s available for Rhino, the Mozilla implementation of JavaScript in Java. Here’s how to get to it from the shell in Mac OS X.

Download Rhino (as of this writing) 1.6R2.
Download Apache XBeans.
Uncompress both archives.
Create, if you don’t have it already, ~/Library/Java/Extensions.
Copy js.jar from Rhino, xbean.jar [...]

From JSON to XPath

Aaron of Montreal considers JSON:
I still don’t like JSON. It works and working code always win but its arrival as the next Best Thing Evar on the Intarweb only confirms that it’s a hack.

And like me, would like to see XPath in the browser.
“Coincidentally, I tend to think of XPath the same way its benefactors [...]

XSLT script to generate album listing from iTunes XML

[ via Paddy Dwyer ] An XSLT script to generate album listing from iTunes XML

Crawling Back to 1.0

This weblog is a hobby, I must remind myself.
One thing I knew going to WordPress was that for reasons opaque to me, the package supports Atom 0.3, but not 1.0.
However, there’s a patch to provide support for 1.0 but it hasn’t been committed.
There’s also a template only patch.
Both of these escape the body of a [...]

A Couple of XSLT/XPath Links

Edward Vielmetti took a REST web service for querying the Ann Arbor library’s catalog, and wrote an XSL transform to produce a page that looks like a card from a catalog:

Todd Ditchendorf released a Mac application called AquaPath that lets you run XPath expressions against XML and see the results highlighted in the source document.

Computing word count in XML documents

Uche Ogbuji has a tip on using XSLT with the EXSLT extensions to get the word count of a XML document.

Unicode spaces

Unicode contains a whole menagerie of characters for spaces from wide to skinny. For example, you use the thin ones when setting initials. More useful punctuation in Unicode.

On Word Processing

Nathan Young: Using word is much easier than using emacs… but using word and keeping it from breaking things is about as hard as using emacs.

Safari Guide 1.3 adds XSLT

Todd’s 1.3 release of Safari Guide adds support for XSL Transforms.