More on XMP Extraction

Dan Lyke wrote about the Perl regular expression for extracting XMP from Adobe files:

In the XML extractor, try replacing the grouping for $3, currently “(.*)”, with “(.*?)”. “*” and “+” are default greedy, so the rest of the match will match the last occurrence of that in the document. The “?” makes them non-greedy. Almost always people mean “.*?” rather than “.*”.

So that expression becomes:


m/id='W5M0MpCehiHzreSzNTczkc9d'\s* \

(bytes=')*([^']*)'?\?> \

(.*?)<\?xpacket end='([^']*)'\?>/sg

Note that I split the expression across three lines.

More like this: , , .