RE: Sort Order

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Dec 04 2003 - 22:04:00 EST

  • Next message: Philippe Verdy: "RE: Compression through normalization"

    mjabbar@bangla.net writes:
    > Please also inform me about what will be the sorting for Bangla.
    > Thanks and regards
    > Mustafa Jabbar

    Same response: you don't sort on codepoints but using UCA and the default
    Unicode collation elements table (DUCET) published in Unicode charts, but
    compiled for example as a text file containing collation rules (see
    UCARules.txt in ICU) or as a complete conversion table from codepoints to
    collation weights.

    For Bengla, the DUCET will certainly not be enough to match all your needs,
    and you'll probably need to tailor the collation order using expansion rules
    and swaps with more collation levels than what is shown in DUCET (just just
    documents 3 levels before the codepoint order: primary, secondary, ternary).

    It will be however simpler than sorting Thai with the logical (phonetic)
    order, which requires a preprocessing to find grapheme clusters and
    syllables with a dictionnary, unless you prefer to sort simply on the visual
    order I confess that I have not attempted to do any sorting of Thai data. If
    I had to do that I would need to use a complete implementation found in ICU
    (but ICU is quite large for some projects).

    __________________________________________________________________
    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE! http://www.ellaforspam.com





    This archive was generated by hypermail 2.1.5 : Thu Dec 04 2003 - 22:57:21 EST