Re: Sort Order

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Dec 04 2003 - 16:48:36 EST

Next message: Kenneth Whistler: "Re: Compression through normalization"

Previous message: John Jenkins: "Re: OT: Free Fonts"
In reply to: Gupta, Rohit4: "Sort Order"
Next in thread: Kenneth Whistler: "Re: Sort Order"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Gupta, Rohit4 wrote:
> We are using UNICODE for representing Japanese characters.
>
> Will the Japanese characters be sorted according to:
> a) There order in the Japanese character set OR
> b) Order of their listing in the UNICODE representation. OR
> c) The result of the two approaches above be the same.

If you sort Unicode strings in their binary order, then you get b).

If you use the Unicode Collation Algorithm (UCA), then you get better results for Katakana/Hiragana,
but still Unicode order (b) for Kanji/Han characters. See http://www.unicode.org/reports/tr10/

ICU implements UCA and also provides a Japanese tailoring for JIS X 4061 order. Kanji are sorted in
JIS X 0208 order (a). For this you simply instantiate a Collator object for the locale ID "ja" and
use it to compare strings or to generate sort keys. This is supported in both the C/C++ and Java
versions of ICU.

http://oss.software.ibm.com/icu/userguide/Collate_Intro.html
http://oss.software.ibm.com/icu/

Best regards,
markus

Next message: Kenneth Whistler: "Re: Compression through normalization"
Previous message: John Jenkins: "Re: OT: Free Fonts"
In reply to: Gupta, Rohit4: "Sort Order"
Next in thread: Kenneth Whistler: "Re: Sort Order"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Dec 04 2003 - 17:51:15 EST