More on XMP Extraction

Dan Lyke wrote about the Perl regular expression for extracting XMP from Adobe files:

In the XML extractor, try replacing the grouping for $3, currently “(.*)”, with “(.*?)”. “*” and “+” are default greedy, so the rest of the match will match the last occurrence of that in the document. The “?” makes them non-greedy. Almost always people mean “.*?” rather than “.*”.

So that expression becomes:


m/id='W5M0MpCehiHzreSzNTczkc9d'\s* \

(bytes=')*([^']*)'?\?> \

(.*?)<\?xpacket end='([^']*)'\?>/sg

Note that I split the expression across three lines.

Possibly Related posts (machine generated):

  1. Adobe XMP Extractor
  2. More on PHP XPath
  3. Regular Expressions in XSLT are Possible
  4. Bad Ideas and Shocking Numbers
  5. Obfuscation is the Trade Secret of Web Services

More like this: , , .