From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Aug 04 2003 - 16:39:09 EDT
Chris Jacobs wrote:
> [ cc Theodore Smith ]
>
> So I had it wrong, it _is_ deprecated.
It isn't exactly "deprecated", since deprecation has a
rather strong sense in the standard, and is correlated with
the formal assignment of a deprecated property to the
character.
Use of the code point U+FEFF is clearly *not* deprecated
in the standard.
The current situation is briefly as follows:
The standard *requires* the use of U+FEFF for some of
the Unicode encoding schemes. Details are spelled out in:
http://www.unicode.org/book/preview/ch03.pdf
Because of those requirements and the nature of the encoding
scheme definitions, the occurrence of U+FEFF in initial
position in *some* of the encoding schemes forces its
interpretation as a zero width no-break space, rather
than as a byte order mark. The difference is roughly
as follows: a BOM is not formally part of the content
of the text, but rather is part of the specification
of the encoding scheme; a ZWNBSP is formally part of the
content of the text.
*Because* this distinction, which is required for backwards
compatibility with existing usage of U+FEFF, is rather
subtle and confusing, and *because*, nonetheless, the
idea of having a character to indicate a no-break position
is a useful one, the UTC standardized (in Unicode 3.2),
U+2060 WORD JOINER as the *preferred* character to use
in the latter situation.
In other words, if what you need is to glue things together,
i.e. a zero width no-break space *function*, then use
U+2060. If what you need is a BOM for the encoding scheme
specifications, then use U+FEFF.
What is *discouraged*, but not prohibited, of course, is
using U+FEFF for a zero width no-break space *function*,
precisely because that interacts so confusingly with
the BOM.
--Ken
This archive was generated by hypermail 2.1.5 : Mon Aug 04 2003 - 17:26:44 EDT