The word of the day is Mojibake, and I am not happy about it.
Meta
Shortened Permalink
This post's short url is http://www.whump.com/moreLikeThis/s/yvlff
For completeness: Joel on Unicode.
For future reference: writing Unicode applications with PHP.
Fieldmethods found a weblog on internationalization.
To read closer: the problem is that the Unicode space, around 64,000 code points, is smaller than the 170,000 characters used in modern and ancient Chinese (mainland, RoC, and expatriate communities).
XML defines only a few entities, so if you want to use € you have have a definition for it in a DTD. Tony Coates and Zarella Rendon propose a non-DTD way around the problem for entities in text nodes using an XSLT transform library. This is the first entry in a new category, I18N. [...]