Re: [hebrew] Re: ZWJ, ZWNJ, CGJ and combination

From: Philippe Verdy (
Date: Sat Nov 08 2003 - 20:15:13 EST

  • Next message: Mark Davis: "Re: [hebrew] Re: ZWJ, ZWNJ, CGJ and combination"

    I'm curious about what name you would give to it.
    The name COMBINING CHARACTER JOINER is already used...

    In all our discussions we should have used the term "starter" (instead of
    just "base character" which is ambiguous) for any characters of combining
    class 0 and which include:

        Base characters (includes conjoining characters):
            letter, syllable or ideograph (gc=L*),
            number (gc=N*),
            punctuation (gc=P*),
            symbol (gc=S*),
            space (gc=Zs)
            agreed private use characters (gc=Co and private agreement)
        Starter Combining characters:
            (gc=M* and CC=0) such as CGJ
            (gc=C* except Co),
        Text separators:
            (gc=Zl, Zp)
        Unknown private use characters:
            (gc=Co and no private agreement)

    For other characters with combining class > 0, we should have used the term
    "non-starter", not the term "combining character" which may or may not be a

    It is clear however that we made a distinction between "combining sequences"
    (made of a unique starter and optionally followed by non-starters) and
    "grapheme clusters" (which are made of one or more combining sequences). For
    example, the (hypothetic) encoded text:


    is made of 7 "combining sequences":

        <VS1, HOLAM>,
        <NUN, HATAF PATAH>,
        <CGJ, METEG>

    (where the starters are VAV, VS1, NUN, CGJ),
    and 3 "grapheme clusters":

        <ALEF, ZWJ, LAMED,
        <VAV, VS1, HOLAM>,

    (ZWJ is a format control and ignored in the determination of grapheme
    cluster boundaries).

    Grapheme clusters may be created by grouping several combining sequences
    without using CGJ, ZWJ, ZWNJ, or variant selectors: see examples in South
    Asian scripts, and with Hangul Jamos.

    Generally, collation and rendering of text works on grapheme clusters (or
    groups of these clusters with language-specific tailoring); but not on
    combining sequences whose role is either related to string identity
    excluding any concept of relative order (i.e. normalization and canonical
    equivalence), or to text transforms or folding.

    Compatibility equivalence is also defined but neither on combining
    sequences, nor on grapheme clusters: there may be a mapping from one
    character (i.e. only a part of a combining sequence) to several characters
    that belong to distinct combining sequences and distinct grapheme clusters,
    for example with some ligatures of base letters (example: the "ffi"
    ligature, which participates to only 1 combining sequence and only 1
    grapheme cluster, is mapped to 3 distinct combining sequences and 3 distinct
    grapheme clusters).

    ----- Original Message -----
    From: "Peter Kirk" <>
    To: <>
    Sent: Sunday, November 09, 2003 1:20 AM
    Subject: [hebrew] Re: ZWJ, ZWNJ, CGJ and combination

    > So that you don't hold try to your breath over the weekend to find out
    > what I am planning to propose, as announced on the main Unicode list...
    > The issue in question is the ligation of hataf vowels and meteg. Hataf
    > vowels with medial meteg are clear cases of ligatures between the basic
    > vowels and meteg. But there seems to be no mechanism in Unicode so far
    > to promote such a ligature. So, my suggestion is to propose a new
    > combining character COMBINING CHARACTER JOINER (combining class zero),
    > defined with semantics similar to ZWJ rather than CGJ i.e. to affect
    > ligation but not collation.
    > Comments?
    > --
    > Peter Kirk
    > (personal)
    > (work)

    This archive was generated by hypermail 2.1.5 : Sat Nov 08 2003 - 21:01:56 EST