From: Jukka K. Korpela (email@example.com)
Date: Mon Dec 28 2009 - 12:50:20 CST
- - wrote:
> 2) For code points in planes 0 to 2 (BMP, SMP, SIP) filter the
> * 0xFEFF (byte order mark, no use in UTF-8 and may be
> potentially dangerous if converted later to UTF-16 without proper
Others have commented on the big picture, which remains somewhat obscure,
and I have just one note on a detail: U+FEFF is, by definition, ZERO WIDTH
NO-BREAK SPACE when it occurs anywhere except at the start of data stream.
In that role, it acts as invisible glue that prevents a line break where it
might otherwise be introduced. Even though you might say that another
character is preferred for such usage, U+FEFF is still the one that works
most widely, in popular software like Microsoft Word and Internet Explorer.
(Technically, they do not operate on plain text, but they do operate on
text, and U+FEFF is the text-level weapon that one can use.)
Therefore, regarding U+FEFF as not allowed in plain text datastream would be
a big mistake, even though filtering it out would normally result in
inferior typography at most.
-- Yucca, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Mon Dec 28 2009 - 12:53:12 CST