Re: ZWNJ & Persian Collation

From: Markus Scherer (
Date: Wed Mar 12 2003 - 11:48:54 EST

  • Next message: Doug Ewell: "Re: Need encoding conversion routines"

    Roozbeh Pournader wrote:
    > Well, anything that is completely ignored in collation creates problems
    > with deterministic sorting.

    I don't think you mean "deterministic". UCA is deterministic, it just sorts many strings as equal.

    > There are certain words in Persian, with
    > completely different meanings, that only differ in a ZWNJ[1]. Having ZWNJ
    > ignored by default, means they may appear in this or that order, possibly
    > based on the original order of input. I guess this is not what we want
    > for deterministic collation.
    > The desired behavior for ZWNJ, is being treated like punctuations.
    > Ignored in the first levels, but considered at the end. (Personal Note:
    > write something for UTC on this.)

    Possible. I assume that ZWNJ is ignored in UCA because that is the expected behavior for many other
    languages. Not ignoring ZWNJ is possible with a tailoring that gives it some non-zero weights.

    Note that many languages require tailorings for at least a couple of characters to follow national


    Opinions expressed here may not reflect my company's positions unless otherwise noted.

    This archive was generated by hypermail 2.1.5 : Wed Mar 12 2003 - 12:40:49 EST