RE: ZWJ, ZWNJ, CGJ and combination

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Mon Nov 10 2003 - 06:01:59 EST

  • Next message: Kent Karlsson: "RE: Tamil 0BB3 and 0BD7"

    Peter Kirk wrote:

    > But does the Khmer script follow this rule? Please bear in mind that I
    > know nothing about this script. But in TUS v4.0 10.4 p.281 I read:
    >
    > > Ordering of Syllable Components. The standard order of components in
    > > an orthographic syllable as expressed in BNF is
    > > B {R | C} {S {R}}* {{Z} V} {O} {S}
    ...
    > > Z is the zero width non-joiner
    ...
    > The first example given using ZWNJ, on p.282, starts with ba + ZWNJ +
    > triisap + ii, i.e. <1794, ZWNJ, 17CA, 17B8>. 1794 is a base character
    > (Lo), but 17CA and 17B8 are class 0 combining characters (Mn). The
    > syntax implies that other Mn characters, e.g. robat, 17CC, may occur
    > between the base character and the ZWNJ. So here is a case in natural
    > language where ZWNJ may be both preceded and followed by combining
    > characters, giving a technically defective combining
    > sequence. Or have I misunderstood things here?
    >
    > Note that I am not proposing a change to Khmer, but just a clarification
    > of definitions and the consistency of their application, and a good
    > reason why what is allowed in Khmer would not be allowed in Hebrew.

    I would see this use of ZWJ and ZWNJ as a mistake. But the publication
    of this use made me propose to make ZWJ and ZWNJ into combining
    characters. However, that was not accepted since that would interfere
    with the Bidi algorithm. I'm not sure how bad that would be though.
    (I wouldn't be surprised if it even would be beneficial, though it would
    be a break in method compared to the current specification.)

                    /kent k





    This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 06:47:33 EST