Re: Sort Order

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Dec 04 2003 - 16:48:36 EST

  • Next message: Kenneth Whistler: "Re: Compression through normalization"

    Gupta, Rohit4 wrote:
    > We are using UNICODE for representing Japanese characters.
    >
    > Will the Japanese characters be sorted according to:
    > a) There order in the Japanese character set OR
    > b) Order of their listing in the UNICODE representation. OR
    > c) The result of the two approaches above be the same.

    If you sort Unicode strings in their binary order, then you get b).

    If you use the Unicode Collation Algorithm (UCA), then you get better results for Katakana/Hiragana,
    but still Unicode order (b) for Kanji/Han characters. See http://www.unicode.org/reports/tr10/

    ICU implements UCA and also provides a Japanese tailoring for JIS X 4061 order. Kanji are sorted in
    JIS X 0208 order (a). For this you simply instantiate a Collator object for the locale ID "ja" and
    use it to compare strings or to generate sort keys. This is supported in both the C/C++ and Java
    versions of ICU.

    http://oss.software.ibm.com/icu/userguide/Collate_Intro.html
    http://oss.software.ibm.com/icu/

    Best regards,
    markus



    This archive was generated by hypermail 2.1.5 : Thu Dec 04 2003 - 17:51:15 EST