Re: Questions on ZWNBS - for line initial holam plus alef

From: Peter Kirk (
Date: Mon Aug 04 2003 - 18:57:07 EDT

  • Next message: Kenneth Whistler: "Re: Questions on ZWNBS - for line initial holam plus alef"

    On 04/08/2003 14:59, Kenneth Whistler wrote:

    >Peter Kirk asked:
    >>>In other words, if what you need is to glue things together,
    >>>i.e. a zero width no-break space *function*, then use
    >>>U+2060. If what you need is a BOM for the encoding scheme
    >>>specifications, then use U+FEFF.
    >>>What is *discouraged*, but not prohibited, of course, is
    >>>using U+FEFF for a zero width no-break space *function*,
    >>>precisely because that interacts so confusingly with
    >>>the BOM.
    >>And what if you need a ZWNBS function for something other than gluing
    >>things together? For example, as a carrier for a string or line initial
    >>diacritical mark when no spacing is required?
    >This is not something sanctioned by the standard.
    >The carrier for a combining mark that is to display in isolation without
    >a base character is U+0020 SPACE. If you want to also indicate the
    >absence of a line break opportunity, then the carrier is U+00A0
    Neither of these is appropriate to the case I have in mind (described in
    greater detail below) as they are not zero width and therefore give an
    unwanted indent at the start of a line. U+200B ZERO WIDTH SPACE might be
    appropriate, but this has the problem that it is a break opportunity,
    which is not always appropriate.

    >Despite its name, U+FEFF ZWNBS is *NOT* a space character. It is
    >formally gc=Cf, not gc=Zs. It also does not have the White_Space
    >So "a ZWNBS function for something other than gluing things together"
    >is a contradiction in terms of the current definition of the standard.
    >The *meaning* of the "ZWNBS function" is its behavior in the
    >context of UAX #14, Line Breaking Properties. See the WJ Word joiner
    >entry (normative) of UAX #14:
    Thank you, Ken, and also Mark. I didn't know where to find these
    details. Mark wrote:

    >names may be misleading; people intending to use them for any other
    >function should carefully read the sections of the Unicode Standard
    >that discuss their usage.
    But which sections? Where is the index, online? It is unfortunate that
    there are no links from the character charts or the database to the
    various places where the uses of the characters are explained. All there
    is is a character name, and as I have found quite often this character
    name is seriously misleading if not actually incorrect. It is highly
    unfortunate that it is not permitted to change these misleading names.

    As it is, the note at U+FEFF in the character charts reads "use as an
    indication of non-breaking is deprecated...", although you wrote that
    this was not deprecated. But there is no note that use of ZERO WIDTH
    NO-BREAK SPACE as a zero width no-break space is deprecated or "a
    contradiction in terms of the current definition of the standard". Are
    you surprised that I am confused?

    Ken continued:

    >>This is one of the
    >>suggestions for some of the Hebrew problems, but I have had no response
    >>to my suggestion of using U+2060, which is inappropriately named for the
    >>function I have in mind.
    >The function I think you have in mind is not isolated display of
    >a combining mark, but rather trying to find a mechanism for
    >getting around the conformance strictures of the standard, to
    >get a combining mark to apply to a *following* base
    >character, rather than to a *preceding* base character.
    If by "apply" in the above you mean "be positioned adjacent to", there
    is already a problem with the standard: the EXISTING Hebrew page of the
    standard is in contravention to its conformance strictures. This is
    because under the existing standard (irrespective of any changes being
    proposed) and in legacy encodings, the combining mark holam, which is
    usually graphically positioned above the preceding base character, is in
    certain environments, specifically when followed by a silent alef (holam
    male is a separate issue), graphically positioned above the following
    base character. But the standard has anticipated this kind of difficulty
    by recognising that positioning is not always consistent with logical
    ordering, see the note on Indic vowel signs in The Unicode Standard 4.0
    section 2.10, subsection "Sequence of Base Characters and Diacritics", This is a documented
    special case; Hebrew holam followed by silent alef is also a special
    case whether you like it or not, it just hasn't been documented. It
    could be removed, but that would require changes to every existing
    (ancient or modern) pointed Hebrew text.

    >Trying to use U+FEFF *or* U+2060 to do this would be inappropriate.
    Understood. I await alternative suggestions.


    Peter Kirk

    This archive was generated by hypermail 2.1.5 : Mon Aug 04 2003 - 19:30:52 EDT