Re: Questions on ZWNBS

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Aug 04 2003 - 16:39:09 EDT

  • Next message: Peter Kirk: "Re: Questions on ZWNBS"

    Chris Jacobs wrote:

    > [ cc Theodore Smith ]
    >
    > So I had it wrong, it _is_ deprecated.

    It isn't exactly "deprecated", since deprecation has a
    rather strong sense in the standard, and is correlated with
    the formal assignment of a deprecated property to the
    character.

    Use of the code point U+FEFF is clearly *not* deprecated
    in the standard.

    The current situation is briefly as follows:

    The standard *requires* the use of U+FEFF for some of
    the Unicode encoding schemes. Details are spelled out in:

    http://www.unicode.org/book/preview/ch03.pdf

    Because of those requirements and the nature of the encoding
    scheme definitions, the occurrence of U+FEFF in initial
    position in *some* of the encoding schemes forces its
    interpretation as a zero width no-break space, rather
    than as a byte order mark. The difference is roughly
    as follows: a BOM is not formally part of the content
    of the text, but rather is part of the specification
    of the encoding scheme; a ZWNBSP is formally part of the
    content of the text.

    *Because* this distinction, which is required for backwards
    compatibility with existing usage of U+FEFF, is rather
    subtle and confusing, and *because*, nonetheless, the
    idea of having a character to indicate a no-break position
    is a useful one, the UTC standardized (in Unicode 3.2),
    U+2060 WORD JOINER as the *preferred* character to use
    in the latter situation.

    In other words, if what you need is to glue things together,
    i.e. a zero width no-break space *function*, then use
    U+2060. If what you need is a BOM for the encoding scheme
    specifications, then use U+FEFF.

    What is *discouraged*, but not prohibited, of course, is
    using U+FEFF for a zero width no-break space *function*,
    precisely because that interacts so confusingly with
    the BOM.

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Aug 04 2003 - 17:26:44 EDT