Adobe XMP Extractor

Over at w3.org, Dan Brickley wrote a web tool to extract Adobe XMP data from Adobe files such as PDFs and Photoshop files.

I asked him about the script, and he wrote that the key is a Perl regular expression applied to the document:


m/id='W5M0MpCehiHzreSzNTczkc9d'\s*(bytes=')*([^']*)'?\?>(.*)<\?xpacket end='([^']*)'\?>/sg

The XMP is returned in $3.

You’ll want to experiment with this expression because in practice, it’s a little greedy with respect to line endings. In some documents, I get back the XMP and the rest of the PDF file. Line endings are the culprit, I think.

Possibly Related posts (machine generated):

  1. More on XMP Extraction
  2. Adobe: eXtensible Metadata Platform
  3. More on PHP XPath
  4. Adobe Relaunches Framemaker to try for leadership on the Document Side of XML
  5. Sun’s XSLT Compiler

More like this: , , .

blog comments powered by Disqus