Re: Questions on ZWNBS

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Aug 04 2003 - 17:59:03 EDT

  • Next message: Mark Davis: "Re: Questions on ZWNBS"

    Peter Kirk asked:

    > >In other words, if what you need is to glue things together,
    > >i.e. a zero width no-break space *function*, then use
    > >U+2060. If what you need is a BOM for the encoding scheme
    > >specifications, then use U+FEFF.
    > >
    > >What is *discouraged*, but not prohibited, of course, is
    > >using U+FEFF for a zero width no-break space *function*,
    > >precisely because that interacts so confusingly with
    > >the BOM.
    > >
    > >--Ken
    > >
    > And what if you need a ZWNBS function for something other than gluing
    > things together? For example, as a carrier for a string or line initial
    > diacritical mark when no spacing is required?

    This is not something sanctioned by the standard.

    The carrier for a combining mark that is to display in isolation without
    a base character is U+0020 SPACE. If you want to also indicate the
    absence of a line break opportunity, then the carrier is U+00A0
    NO-BREAK SPACE (NBSP).

    Despite its name, U+FEFF ZWNBS is *NOT* a space character. It is
    formally gc=Cf, not gc=Zs. It also does not have the White_Space
    property.

    So "a ZWNBS function for something other than gluing things together"
    is a contradiction in terms of the current definition of the standard.
    The *meaning* of the "ZWNBS function" is its behavior in the
    context of UAX #14, Line Breaking Properties. See the WJ Word joiner
    entry (normative) of UAX #14:

    http://www.unicode.org/reports/tr14/

    > This is one of the
    > suggestions for some of the Hebrew problems, but I have had no response
    > to my suggestion of using U+2060, which is inappropriately named for the
    > function I have in mind.

    The function I think you have in mind is not isolated display of
    a combining mark, but rather trying to find a mechanism for
    getting around the conformance strictures of the standard, to
    get a combining mark to apply to a *following* base
    character, rather than to a *preceding* base character.

    Trying to use U+FEFF *or* U+2060 to do this would be inappropriate.

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Aug 04 2003 - 18:43:28 EDT