Re: Questions on ZWNBS - for line initial holam plus alef

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Aug 06 2003 - 19:13:21 EDT

Next message: John Jenkins: "Re: Conflicting principles"

Previous message: Michael Everson: "Re: Conflicting principles"
Maybe in reply to: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"
Next in thread: Doug Ewell: "Re: Questions on ZWNBS - for line initial holam plus alef"
Reply: Doug Ewell: "Re: Questions on ZWNBS - for line initial holam plus alef"
Reply: John Cowan: "Re: Questions on ZWNBS - for line initial holam plus alef"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy said:

> > The same thing can be said about any inserted invisible character,
> > combining or not.
> >
> > How is: <a, ring above, null, dot below> supposed to be different from
> > <a, dot below, null, ring above>
> >
> > How is: <a, ring above, LRM, dot below> supposed to be different from
> > <a, dot below, LRM, ring above>
> >
> > In display, they might not be distinct, unless you were doing some
> > kind of show-hidden display. Yet these sequences are not canonically
> > equivalent, and the presence of an embedded control character or an
> > embedded format control character would block canonical reordering.
>
>
> I disagree with you, using a LRM mark in the middle of a combining
> sequence is conforming to canonicalization rules but is clearly
> ill-formed,

It is not. TUS 4.0, p. 71:

D17a Defective combining character sequence: A combining character
     sequence that does not start with a base character.

     * Defective combining character sequences occur when a sequence
       of combining characters appears at the start of a string or
       follows a control or format character. Such sequences are
       defective from the point of view of handling of combining
       marks, but are not ill-formed.
              ^^^^^^^^^^^^^^^^^^^^^^

> as well as using a NULL control in the middle, which
> breaks the combining sequence.

I'm not claiming it doesn't break the combining sequence. Of
course it does. It creates a defective combining character
sequence, and that poses a challenge for rendering, since it
departs from the usual expectations for normal combining
character sequences. The renderer has to split hairs between
the fact that it is dealing with a defective combining
character sequence and the fact that it is dealing with a
default ignorable character which is supposed to be ignored
for text processes it is not immediately applicable to.

But I challenge you to find anything in the standard that
*prohibits* such sequences from occurring.

And *if* they occur, they are not canonically equivalent, which
was the point I was making to Kent.

> The proposal to use CGJ however is legal: it does not break the
> combining sequences and grapheme clusters, and thus the whole
> encoded sequence encoded with CGJ will be considered by
> rendering engines, where CGJ is a no-op for rendering but not for
> the canonical ordering ...

Well, yes, which is why I have been advocating it as the
solution to the Biblical Hebrew text representation problem.
I agree with you about that. But it need not be characterized
as "legal" in opposition to the other examples I cited above.
All of these sequences are "legal" and allowed by the
standard.

--Ken

Next message: John Jenkins: "Re: Conflicting principles"
Previous message: Michael Everson: "Re: Conflicting principles"
Maybe in reply to: Peter Kirk: "Re: Questions on ZWNBS - for line initial holam plus alef"
Next in thread: Doug Ewell: "Re: Questions on ZWNBS - for line initial holam plus alef"
Reply: Doug Ewell: "Re: Questions on ZWNBS - for line initial holam plus alef"
Reply: John Cowan: "Re: Questions on ZWNBS - for line initial holam plus alef"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Aug 06 2003 - 20:21:25 EDT