Re: ZWNJ & Persian Collation

From: Roozbeh Pournader (roozbeh@sharif.edu)
Date: Wed Mar 12 2003 - 10:37:57 EST

  • Next message: Michael \(michka\) Kaplan: "Re: sorting order between win98/xp"

    On Tue, 11 Mar 2003, Markus Scherer wrote:

    > The Unicode Collation Algorithm (UCA) for which allkeys.txt is the
    > default weight table does treat ZWNJ and a number of other characters as
    > special. For these, they are completely ignored by the UCA - same as if
    > you stripped them from the text.

    Well, anything that is completely ignored in collation creates problems
    with deterministic sorting. There are certain words in Persian, with
    completely different meanings, that only differ in a ZWNJ[1]. Having ZWNJ
    ignored by default, means they may appear in this or that order, possibly
    based on the original order of input. I guess this is not what we want
    for deterministic collation.

    The desired behavior for ZWNJ, is being treated like punctuations.
    Ignored in the first levels, but considered at the end. (Personal Note:
    write something for UTC on this.)

    roozbeh

    [1] A good example, is نام‌های or نامهای (names of) vs
    نامه‌ای (a letter). Their only difference in encoding is
    existence or non-existence of ZWNJs, or its different place in the word.



    This archive was generated by hypermail 2.1.5 : Wed Mar 12 2003 - 11:17:59 EST