From: Satoshi Nakagawa (psychs@limechat.net)
Date: Mon Oct 12 2009 - 14:29:00 CDT
I have checked the Unicode CLDR collation data, but it contains data
only for the tertiary strength.
IMHO, for example, [っ] (U+3063) and [つ] (U+3064) shoule be treated as
different characters in the primary strength. Because these are never
treated as the same characters in Japanese, even if these have similar
gryphs.
I would suggest to mofidy the Default Unicode Collation Element Table.
In http://www.unicode.org/Public/UCA/latest/allkeys.txt,
3063 ; [.27B0.0020.000D.3063] # HIRAGANA LETTER SMALL TU
3064 ; [.27B0.0020.000E.3064] # HIRAGANA LETTER TU
30C3 ; [.27B0.0020.000F.30C3] # KATAKANA LETTER SMALL TU
FF6F ; [.27B0.0020.0010.FF6F] # HALFWIDTH KATAKANA LETTER SMALL TU; QQK
30C4 ; [.27B0.0020.0011.30C4] # KATAKANA LETTER TU
FF82 ; [.27B0.0020.0012.FF82] # HALFWIDTH KATAKANA LETTER TU; QQK
32E1 ; [.27B0.0020.0013.32E1] # CIRCLED KATAKANA TU; QQK
3065 ; [.27B0.0020.000E.3064][.0000.018B.0002.3099] # HIRAGANA LETTER DU; QQCM
30C5 ; [.27B0.0020.0011.30C4][.0000.018B.0002.3099] # KATAKANA LETTER DU; QQCM
this part specifies [っ] (U+3063) and [つ] (U+3064) are treated as the
same character in the primary strength and the secondary strength.
My suggestion would be like this.
3063 ; [.3267.0020.000D.3063] # HIRAGANA LETTER SMALL TU
3064 ; [.27B0.0020.000E.3064] # HIRAGANA LETTER TU
30C3 ; [.3267.0020.000F.30C3] # KATAKANA LETTER SMALL TU
FF6F ; [.3267.0020.0010.FF6F] # HALFWIDTH KATAKANA LETTER SMALL TU; QQK
30C4 ; [.27B0.0020.0011.30C4] # KATAKANA LETTER TU
FF82 ; [.27B0.0020.0012.FF82] # HALFWIDTH KATAKANA LETTER TU; QQK
32E1 ; [.27B0.0020.0013.32E1] # CIRCLED KATAKANA TU; QQK
3065 ; [.27B0.0020.000E.3064][.0000.018B.0002.3099] # HIRAGANA LETTER DU; QQCM
30C5 ; [.27B0.0020.0011.30C4][.0000.018B.0002.3099] # KATAKANA LETTER DU; QQCM
Then [っ] (U+3063) and [つ] (U+3064) are always treated as different characters.
And not only [っ] and [つ], all character pairs in my last mail should
be also modified as well.
Note that the JIS standard didn't tell about collation algorithm and
sorting order as far as I know.
-- Satoshi Nakagawa
This archive was generated by hypermail 2.1.5 : Mon Oct 12 2009 - 14:34:00 CDT