Lossless XHTML

Updated with a pointer to the latest version of Tantek’s stump speech on ‘Meaningful XHTML’, and a fix to a broken glyph.

Tuesday night Kevin Marks and Tantek Çelik of Technorati gave a talk at SDForum on Semantic XHTML. Though, instead of calling it Semantic XHTML, I’d call it Lossless XHTML.

The first example Kevin walked us through comes from Technorati’s US elections site where they track the most linked posts from a collection of political bloggers.

They generate the page as XHTML, so it can be parsed. During the two major party conventions, they read the page as a database and updated a Flash chart tracking the rise and fall in linking to major pundit blogs.

They used the politics page as a denormalized view of their link database.

Tantek followed up with a walk through representing data structures in XHTML.

You’ll probably recall Tantek, Matthew Mullenweg, and Eric Meyer’s work with the rel attribute of the anchor that produced XFN, for programatically describing the relationship between an author and a person.

But the hack that opened my eyes was representing hashes as Definition Lists.

XFN has a profile listing the allowed values of all the relationships. They represent it with a nested defintion list.

The nested definition list becomes an associative array: a common structure used all the time in scripting languages.

friendship:

{

contact: "Someone you know how to get in touch with. Often symmetric.",

acquaintance:  "Someone who you have exchanged greetings and not

much (if any) more -- maybe a short conversation or two. Often symmetric.",

friend:  "Someone you are a friend to. A compatriot, buddy,

home(boy|girl) that you know. Often symmetric."

}

OPML for blogrolls and Macintosh Property Lists contain the same sort of hierarchial data.

If you transform them into XHTML and use the class attribute to capture the roles, you losslessly convert a non-browser ready XML document or hash into XHTML. Editing your blogroll or property file can be done in an HTML editor.

<div><h3>Blogroll</h3>

<dl class="outline">

    <dt class="url"><a href="http://example.com">http://example.com/</a></dt>

    <dd class="text">A site used as an example URL in RFCs.</dd>

    <dt class="url"><a href="http://www.w3.org">http://www.w3.org/</a></dt>

    <dd class="text">A standards body</dd>

    ...

 </dl>

</div>

A crawler or feedreader could parse that into a hash. An XSLT style sheet could turn it back into OPML. There’s a XSLT transform to turn an iTunes Property List into RDF. Going to an from XHTML would be a modification of the transform.

The style feels like literate programming, but the comments and code are one and the same.

These techniques are non-lossy transforms of data to XHTML for presentation.

Kevin and Tantek are looking for other common formats and structures they can represent in XHTML. At this year’s foo camp, they started attacking the iCalendar format.

Tantek’s site has examples of this in action.

I believe this addresses some of the complaints people raise about the Semantic Web. XHTML provides a common language, and profiles such as XFN are common vocabularies.

Possibly Related posts (machine generated):

  1. XSLT Friday
  2. Two Step View for Data to HTML
  3. HTML to XHTML via Tidy
  4. Vote Links
  5. Shrub’s Campaign Finance Data in Machine Readable Format

More like this: , , , .

blog comments powered by Disqus