From: Philippe Verdy (firstname.lastname@example.org)
Date: Wed Aug 06 2003 - 18:22:23 EDT
On Wednesday, August 06, 2003 10:19 PM, Kenneth Whistler <email@example.com> wrote:
> Kent Karlsson responded:
> > > > I see no particular *technical* problem with using WJ, though.
> > > > In contrast
> > > > to the suggestion of using CGJ (re. another problem)
> > > anywhere else but
> > > > at the end of a combining sequence. CGJ has combining class
> > > 0, despite
> > > > being invisible and not ("visually") interfering with any other
> > > > combining
> > > > mark. Using CGJ at a non-final position in a combining sequence
> > > > puts in doubt the entire idea with combining classes and normal
> > > > forms.
> > >
> > > Why?
> > See above (I DID write the motivation!).
> I guess that I did not (and still do not) see the motivation for
> your final statement.
> > Combining classes are generally
> > assigned according to "typographic placement". Combining characters
> > (except those that are really letters) that have the "same"
> > placement, and "interfere typographically" are assigned the same
> > combining class, while those that don't get different classes, and
> > the relative order is then considered unimportant (canonically
> > equivalent). How is then,
> > e.g. <a, ring above, cgj, dot below> supposed to be different from
> > <a, dot below, cgj, ring above> (supposing all involved characters
> > are fully supported), when <a, ring above, dot below> is NOT
> > supposed to be much different from <a, dot below, ring above>
> > (them being canonically equivalent)? An invisible combining
> > character does not interfere typographically with anything, it
> > being invisible!
> The same thing can be said about any inserted invisible character,
> combining or not.
> How is: <a, ring above, null, dot below> supposed to be different from
> <a, dot below, null, ring above>
> How is: <a, ring above, LRM, dot below> supposed to be different from
> <a, dot below, LRM, ring above>
> In display, they might not be distinct, unless you were doing some
> kind of show-hidden display. Yet these sequences are not canonically
> equivalent, and the presence of an embedded control character or an
> embedded format control character would block canonical reordering.
I disagree with you, using a LRM mark in the middle of a combining
sequence is conforming to canonicalization rules but is clearly
ill-formed, as well as using a NULL control in the middle, which
breaks the combining sequence.
So in your two examples above, inserting the LRM or NULL splits
a combining sequence and creates 3 ones, each with their own
properties, and the last one is ill-formed as it contains a combining
character after a control and not a base or combining character.
The proposal to use CGJ however is legal: it does not break the
combining sequences and grapheme clusters, and thus the whole
encoded sequence encoded with CGJ will be considered by
rendering engines, where CGJ is a no-op for rendering but not for
the canonical ordering where I see its only well-formed use as a
canonical ordering fix for NF* normalized forms, or before a
base character to extend the combining sequences used by
renderers or character parsers and breakers.
So your example with:
<a, dot below, LRM, ring above>
would in fact be rendered and parsed as three combining sequences:
<a, dot below>, <LRM>, <ring above>
i.e. a wellformed <a with dot below>, a control (normally invisible,
but may be edited with a visible glyph with a dotted square like in
the Unicode charts), and a ill-formed isolated <ring above> (most
probably rendered with a dotted circle).
So it cannot be thought as equivalent and not even rendered
<a, dot below, ring above>
or its canonical equivalents (not in normalized order but still
conforming and well-formed, and handled equivalently):
<a, ring above, dot below>
<a with ring above, dot below>
-- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.
This archive was generated by hypermail 2.1.5 : Wed Aug 06 2003 - 19:03:36 EDT