From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Dec 04 2003 - 12:04:14 EST
Gupta, Rohit4
> We are using UNICODE for representing Japanese characters.
> Will the Japanese characters be sorted according to:
> a) There order in the Japanese character set OR
Impossible: Hiragana and Katakana characters are already mapped in Unicode.
Don't expect any change in the assigned codepoints, whose order matches the
order found in the oridin Japanese satandard from which they were derived.
If you are speaking about the collation order of Han characters shared with
other East-Asian languages, it won't be possible too, as an ordering rule
that would work with Japanese would not satisfy simultaneously users of
Traditional Chinese or Simplified Chinese or Traditional Korean or
Traditional Vietnamese.
Even in Chinese, there exists several conventions for sorting characters (by
stroke count or by radical/strokes or by pinyin...)
> b) Order of their listing in the UNICODE representation. OR
Impossible also: charts are ordered cosnsistently by their code points, and
the canonical ordering and composition of Unicode strings is also fixed now.
> c) The result of the two approaches above be the same.
No solution for you if you want either (a) or (b). You should better look at
UCA collation rules, which can be tailored using the default collation order
(DUCET) as a convenient base to create derived sort orders, and you may use
also the UniHan database to get additional character properties about
ideographic characters (notably their classification and comptitibility with
existing ideographic standards and repertoires).
If you want better responses, you'll need to be more explicit about what
does not fit your needs with the current representation of Japanese texts
with Unicode.
__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com
This archive was generated by hypermail 2.1.5 : Thu Dec 04 2003 - 13:24:50 EST