From: Ernest Cline (ernestcline@mindspring.com)
Date: Tue Apr 13 2004 - 12:15:38 EDT
> [Original Message]
> From: Philippe Verdy <verdy_p@wanadoo.fr>
>
> From: "John Hudson" <tiro@tiro.com>
> > Philippe Verdy wrote:
> >
> > >>>The problem with <HOLAM, VAV> is that it may follow (in the encoded
> > >>>sequence) some other grapheme cluster terminated by other
cantillation
> > >>>marks. So for me the best candidate would be: <CGJ, HOLAM, VAV>...
> >
> > Would ZWNJ perform the same function?
> >
> > If the intent is that the holam be associated with the vav rather than
the
> > preceding letter, it seems to me that a control character that does not
> > suggest joining or combining with the preceding letter would be tidier.
> > I realise that from a processing perspective it might be irrelevant,
but it
> > would be nice if the names of these control characters still
> > suggested something about their use.
>
> Yes, but the two options need to be considered with the possible caveats
with
> existing implementations. I don't know which is better for collation
purpose (I
> was said that CGJ should never be rendered, but just used to control and
avoid
> canonical reordering, which is why I proposed it: it is not really part
of the
> sequence, but is just inserted to avoid the normalization caveat, so a
renderer
> would just skip over it after normalization, and a renderer that performs
> normalization first could then process the string assuming a consistent
order of
> sequences, without having to consider the case of CGJ).
>
> Also ZWNJ suggests a break which may cause caveats as holam male is
expected
> to occur in the middle or at end of a word, and any attempt to isolate it
from the
> beginning of the word would be disastrous. Is ZWNJ creating a break
opportunity?
> I need to recheck its status in the existing Unicode reference then.
ZWNJ and CGJ are both Line Break class CM, which means that inserting
either will have no effect on line breaking compared to what would happen
if it
weren't there. (Assuming that the preceding character isn't SPACE of
course)
Both are canonical combining class 0, so they both will remain inert under
normalization., so neither line breaking nor normalization provide any
reason
to prefer one over the other.
This archive was generated by hypermail 2.1.5 : Tue Apr 13 2004 - 13:02:24 EDT