Re: [hebrew] Re: Draft proposal for Unicode encoding of holam male

From: Ernest Cline (
Date: Tue Apr 13 2004 - 12:15:38 EDT

  • Next message: Ernest Cline: "Word Dividers And The Terminal_Punctuation Property"

    > [Original Message]
    > From: Philippe Verdy <>
    > From: "John Hudson" <>
    > > Philippe Verdy wrote:
    > >
    > > >>>The problem with <HOLAM, VAV> is that it may follow (in the encoded
    > > >>>sequence) some other grapheme cluster terminated by other
    > > >>>marks. So for me the best candidate would be: <CGJ, HOLAM, VAV>...
    > >
    > > Would ZWNJ perform the same function?
    > >
    > > If the intent is that the holam be associated with the vav rather than
    > > preceding letter, it seems to me that a control character that does not
    > > suggest joining or combining with the preceding letter would be tidier.
    > > I realise that from a processing perspective it might be irrelevant,
    but it
    > > would be nice if the names of these control characters still
    > > suggested something about their use.
    > Yes, but the two options need to be considered with the possible caveats
    > existing implementations. I don't know which is better for collation
    purpose (I
    > was said that CGJ should never be rendered, but just used to control and
    > canonical reordering, which is why I proposed it: it is not really part
    of the
    > sequence, but is just inserted to avoid the normalization caveat, so a
    > would just skip over it after normalization, and a renderer that performs
    > normalization first could then process the string assuming a consistent
    order of
    > sequences, without having to consider the case of CGJ).
    > Also ZWNJ suggests a break which may cause caveats as holam male is
    > to occur in the middle or at end of a word, and any attempt to isolate it
    from the
    > beginning of the word would be disastrous. Is ZWNJ creating a break
    > I need to recheck its status in the existing Unicode reference then.

    ZWNJ and CGJ are both Line Break class CM, which means that inserting
    either will have no effect on line breaking compared to what would happen
    if it
    weren't there. (Assuming that the preceding character isn't SPACE of
    Both are canonical combining class 0, so they both will remain inert under
    normalization., so neither line breaking nor normalization provide any
    to prefer one over the other.

    This archive was generated by hypermail 2.1.5 : Tue Apr 13 2004 - 13:02:24 EDT