ZWJ and ZWNJ in combining sequences, was: New Public Review Issue posted

From: Peter Kirk (
Date: Fri Jan 16 2004 - 17:53:58 EST

  • Next message: Peter Kirk: "Re: Samaritan shan symbol"

    On 16/01/2004 11:17, Rick McGowan wrote:

    >The Unicode Technical Committee has posted a new issue for public review
    >and comment. Details are on the following web page:
    >Review periods for the new item closes on January 27, 2004.
    >Please see the page for links to discussion and relevant documents.
    >Briefly, the new issue is:
    >Issue #27 Joiner/Nonjoiner in Combining Character Sequences
    >Unicode 4.0 describes the structure of Khmer syllables, saying that they
    >may contain an interior ZWJ. There is a problem with this that needs to be
    >resolved in 4.0.1, because some of the characters later in the syllable can
    >be combining characters. This paper describes a proposal with to fix this
    >problem. As a part of the proposal, a choice has to be made among two
    Although this issue has been brought up for review in the light of the
    problem with Khmer, it also has a significant impact on Hebrew, and for
    that reason I am bringing it to the attention of the Hebrew list as well.

    I support the main proposal, which is to allow the ZWJ and ZWNJ
    characters to occur within combining character sequences. When they
    occur between two combining marks, they will indicate joining and
    non-joining forms respectively of those two combining marks. In Hebrew,
    this will provide a convenient mechanism for requesting or inhibiting
    ligatures between meteg and hataf vowels (see secton
    3.5). Previously there was no such mechanism which was strictly
    compatible with Unicode definitions. With this change, the following
    distinctions can be made:

    <vowel, ZWJ, meteg> - medial meteg preferred, but only possible if the
    vowel is a hataf vowel (ZWJ must be ignored for other vowels)

    <vowel, ZWNJ, meteg> - left meteg preferred

    <vowel, meteg> - no preference, font default should be used (probably
    left meteg with all vowels)

    <meteg, CGJ, vowel> - right meteg preferred - or should this last one be
    <meteg, ZWNJ, vowel>, considering that ZWNJ will have the same effect as
    CGJ of blocking canonical reordering?

    I have a small concern that at least potentially there might be a need
    to promote or inhibit a ligature between combining marks which do not
    come together in canonical order. For example, in principle a single
    Hebrew base character might be combined with a hataf vowel (ccc 11-13),
    dagesh (ccc 21) and meteg (ccc 22). In canonical order the dagesh would
    be reordered between the hataf vowel and the meteg, either before or
    after ZWJ/ZWNJ, and would interfere with the mechanism. It might be
    necessary to code <dagesh, CGJ, hataf vowel, ZW(N)J, meteg> or <hataf
    vowel, ZW(N)J, meteg, CGJ, dagesh>. No such combination actually occurs
    in the standard text of the Hebrew Bible, but in principle one might be
    found in other texts.

    At first sight I see no reason to express a preference between option A
    or option B in the review issue, for Hebrew or any other reason.

    Please note the following if you wish to make official feedback to the
    UTC on this matter.

    >If you have comments for official UTC consideration, please post them by
    >submitting your comments through our feedback & reporting page:
    >If you wish to discuss issues on the Unicode mail list, then please use
    >the following link to subscribe (if necessary). Please be aware that
    >discussion comments on the Unicode mail list are not automatically recorded
    >as input to the UTC. You must use the reporting link above to generate
    >comments for UTC consideration.
    >Let me take this opportunity also to remind everyone that the closing date
    >for comment on several other public review issues is approaching, so if
    >you have comments, please try to send them in soon.
    >Note: If you are a liaison representative, please forward this message as
    >appropriate within your organization.
    > Rick McGowan
    > Unicode, Inc.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Fri Jan 16 2004 - 18:34:04 EST