Re: FW: ZWNJ & Persian Collation

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Mar 11 2003 - 16:43:05 EST

  • Next message: Doug Ewell: "Re: Encoding: Unicode Quarterly Newsletter"

    Magda Danish (Unicode) wrote:
    > > -----Original Message-----
    > > From: Vladimir Ivanov [mailto:iranorus@online.ru]

    > > It is clearly seen that there are letters on both sides of ZWNJ within the
    > > word boundaries. Placing ZWNJ on an edge of the word doesn’t make sense in
    > > Persian. From this point of view ZWNJ should be treated as a special
    > > character rather than a delimiter.

    The Unicode Collation Algorithm (UCA) for which allkeys.txt is the default weight table does treat
    ZWNJ and a number of other characters as special. For these, they are completely ignored by the UCA
    - same as if you stripped them from the text.

    > > But in Allkeys Table it is placed on line #68 well before other popular
    > > delimiters: HORIZONTAL TABULATION line #192,

    The order of entries in allkeys is irrelevant; what is relevant is the assignment of weights, and
    ZWNJ gets all-zero weights. You need to implement the algorithm, not just the relative order of
    entries in the file. (allkeys does sort its entries by shifted, multi-level weights, but order for
    same-weight characters does not matter.)

    > > I’ve sold this problem for myself by placing ZWNJ somewhere after
    > > delimiters, but what are the theoretical reasons for putting
    > > it before them?
    > > In order to get what? In what languages?

    "Before" is wrong, see above. Think of ZWNJ as "not there" for UCA.

    > > By the way, the sorting algorithm built into MS Windows puts compound words
    > > with ZWNJ AFTER their simple components. So in this respect it acts on the
    > > principles different from Allkeys Table.

    Windows does not implement the Unicode Collation Algorithm, as far as I know.

    Best regards,
    markus

    -- 
    Opinions expressed here may not reflect my company's positions unless otherwise noted.
    


    This archive was generated by hypermail 2.1.5 : Tue Mar 11 2003 - 17:32:47 EST