Re: Questions on ZWNBS - for line initial holam plus alef

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Aug 04 2003 - 19:52:25 EDT

  • Next message: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"

    Peter,

    > >The carrier for a combining mark that is to display in isolation without
    > >a base character is U+0020 SPACE. If you want to also indicate the
    > >absence of a line break opportunity, then the carrier is U+00A0
    > >NO-BREAK SPACE (NBSP).
    > >
    > Neither of these is appropriate to the case I have in mind (described in
    > greater detail below) as they are not zero width and therefore give an
    > unwanted indent at the start of a line.

    Of course, because the whole point of this convention is to display
    a non-spacing mark in isolation, not applied to a base character.

    > U+200B ZERO WIDTH SPACE might be
    > appropriate, but this has the problem that it is a break opportunity,
    > which is not always appropriate.

    U+200B ZERO WIDTH SPACE is not appropriate, for the same reason
    the U+FEFF (or U+2060) is not appropriate: The Standard does
    not specify the display of non-spacing marks on it as a means
    of showing the marks without base characters. And, as you indicate,
    U+200B (but also U+FEFF and U+2060) are implicated in the control
    of line break opportunities. They are certainly not defined
    as glyph display anchors or some such.

    >

    > >Their
    > >names may be misleading; people intending to use them for any other
    > >function should carefully read the sections of the Unicode Standard
    > >that discuss their usage.
    > >
    > But which sections? Where is the index, online?

    Patience please. The editor is paddling as fast as she can. If
    you will refrain from clicking the remote for just a day or two
    longer, all will be revealed.

    > It is unfortunate that
    > there are no links from the character charts or the database to the
    > various places where the uses of the characters are explained.

    Users of the new online edition of Unicode 4.0 will be pleasantly
    surprised, I predict. The General Index is much expanded and
    improved, and in the pdf the index markers are fully linked,
    so you will be able to click through from the index to a location
    in the text which is indexed. Other links for section references
    and references to external documents will also be "live" in the
    pdf. :-)

    > All there
    > is is a character name, and as I have found quite often this character
    > name is seriously misleading if not actually incorrect. It is highly
    > unfortunate that it is not permitted to change these misleading names.

    Yes, we all agree, but we live with it. For some of the obnoxious
    instances, like ZWNBSP, it is better to just live with the
    abbreviations as opaque monikers, like a "BXLZFITZL", rather than
    focussing on whether the fact that there is a "SPACE" in its
    name actually makes it a space character.

    >
    > As it is, the note at U+FEFF in the character charts reads "use as an
    > indication of non-breaking is deprecated...", although you wrote that
    > this was not deprecated.

    The Unicode *character* U+FEFF is not deprecated, in the precise
    sense of deprecation which is correlated with the character having
    the "Deprecated" property in the Unicode Character Database.
    (U+206C INHIBIT ARABIC FORM SHAPING, for example, *is* deprecated
    in this sense.)

    The use of U+FEFF as a non-breaker (= word joiner) is deprecated,
    in the more general sense of "depreciated, not recommended", because
    use of U+2060 WORD JOINER is less ambiguous and less trouble-prone.

    > But there is no note that use of ZERO WIDTH
    > NO-BREAK SPACE as a zero width no-break space is deprecated or "a
    > contradiction in terms of the current definition of the standard".

    There is further explication in both UAX #14 and in the relevant
    sections of Chapter 15 in Unicode 4.0.

    > Are
    > you surprised that I am confused?

    No. That's why I'm spending time trying to keep making the
    clarifications for you and others.

    > >The function I think you have in mind is not isolated display of
    > >a combining mark, but rather trying to find a mechanism for
    > >getting around the conformance strictures of the standard, to
    > >get a combining mark to apply to a *following* base
    > >character, rather than to a *preceding* base character.
    > >
    > >
    > If by "apply" in the above you mean "be positioned adjacent to",

    No, I mean logical application, in this context.

    There are admitted deficiencies in the standard's text, even
    now, regarding just what the "graphic interaction" for a combining
    mark means -- that is grist for the Unicode 5.0 mill to grind
    very finely, I suggest.

    > there
    > is already a problem with the standard: the EXISTING Hebrew page of the
    > standard is in contravention to its conformance strictures. This is
    > because under the existing standard (irrespective of any changes being
    > proposed) and in legacy encodings, the combining mark holam, which is
    > usually graphically positioned above the preceding base character, is in
    > certain environments, specifically when followed by a silent alef (holam
    > male is a separate issue), graphically positioned above the following
    > base character. But the standard has anticipated this kind of difficulty
    > by recognising that positioning is not always consistent with logical
    > ordering, see the note on Indic vowel signs in The Unicode Standard 4.0
    > section 2.10, subsection "Sequence of Base Characters and Diacritics",
    > http://www.unicode.org/book/preview/ch02.pdf.

    Or meditate on Figure 2-3, Unicode Character Code to Rendered Glyphs.
    That is the fundamental mandala of the standard. ;-)

    > This is a documented
    > special case; Hebrew holam followed by silent alef is also a special
    > case whether you like it or not, it just hasn't been documented. It
    > could be removed, but that would require changes to every existing
    > (ancient or modern) pointed Hebrew text.

    The discussion of details of how to represent these sequences
    should probably migrate back to the hebrew@unicode.org list.

    --Ken

    >
    > >Trying to use U+FEFF *or* U+2060 to do this would be inappropriate.
    > >
    > >
    > Understood. I await alternative suggestions.



    This archive was generated by hypermail 2.1.5 : Mon Aug 04 2003 - 20:31:03 EDT