Re: ZWJ, ZWNJ, CGJ and combination

From: Mark Davis (mark.davis@jtcsv.com)
Date: Sun Nov 09 2003 - 14:11:58 EST

  • Next message: Curtis Clark: "Clarification, please, was Re: Berber/Tifinagh"

    Let's try to be clear on the terms.

    Look at the definition of combining sequences:
    D17 Combining character sequence: A character sequence consisting of either a
    base character followed by a sequence of one or more combining characters, or a
    sequence of one or more combining characters.

    Thus a combining character sequence *cannot* contain a ZWJ or any other Cf.

    Any use of a ZWJ before a combining mark produces a *defective* combining
    character sequence (D17a), which isolates the combining mark from any preceeding
    base character.

    And as I said earlier:

    > - *Default* grapheme clusters do not include ZWJ; as a matter of fact, default
    > grapheme clusters, except for Hangul Jamo Syllables and a few exceptional
    cases,
    > are identical with combining sequences.
    > http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

    > - *Tailored* grapheme clusters may include longer sequences, but it is not at
    > all obvious whether they would contain ever ZWJ or ZWNJ.

    I'll expand on the latter. What constitutes a tailored grapheme cluster is up to
    a particular process, and so one could contain a ZWJ. However, any combining
    mark after a ZWJ does *not* apply to a previous base character within that
    tailored grapheme cluster, so the use of a ZWJ would isolate that combining
    mark. Such a sequence would not correspond to anything used in a natural
    language.

    Mark
    __________________________________
    http://www.macchiato.com
    ► शिष्यादिच्छेत्पराजयम् ◄

    ----- Original Message -----
    From: "Peter Kirk" <peterkirk@qaya.org>
    To: "Mark Davis" <mark.davis@jtcsv.com>
    Cc: "Unicode List" <unicode@unicode.org>
    Sent: Sun, 2003 Nov 09 09:19
    Subject: Re: ZWJ, ZWNJ, CGJ and combination

    > On 08/11/2003 17:09, Mark Davis wrote:
    >
    > >I agree with the first part of your analysis. By the phrase "requesting
    ligation
    > >of combining characters" it is unclear to me what you mean, and whether that
    is
    > >the right solution to whatever problem you are referring to.
    > >
    > >Mark
    > >__________________________________
    > >http://www.macchiato.com
    > >► शिष्यादिच्छेत्पराजयम् ◄
    > >
    > >
    > >
    > A further reply to this one:
    >
    > On the bidi list Paul Nelson pointed out that in Khmer ZWJ and ZWNJ do
    > not break combining sequences; or at least they do not break grapheme
    > clusters, which is not quite the same thing. And the same may be true of
    > Indic scripts, although in the examples I found ZWJ/ZWNJ is always at
    > the end of a combining sequence. Are ZWJ and ZWNJ actually used within
    > combining character sequences (or what would be such sequences if not
    > technically broken)? Is there some tension here with the general
    > definition of combining character sequences?
    >
    > If Khmer really does do this, and unless there are any real objections
    > to this practice, perhaps the best way ahead, rather than defining a new
    > COMBINING CHARACTER JOINER and changing the Khmer encoding, is to adjust
    > the definition of combining character sequences to allow ZWJ, ZWNJ and
    > perhaps some other suitable layout control characters to be included
    > within such sequences. This would allow the Hebrew issue to be solved in
    > a way analogous to the Khmer issue.
    >
    > --
    > Peter Kirk
    > peter@qaya.org (personal)
    > peterkirk@qaya.org (work)
    > http://www.qaya.org/
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sun Nov 09 2003 - 14:57:36 EST