P11: Assamese letters sort order

Last updated: August 31, 2004

1.  Problem
2.  Discussion
Document History

1. Problem

In Assamese, the letter U+09F0 BENGALI LETTER RA WITH MIDDLE DIAGONAL sorts between U+09AF BENGALI LETTER YA and U+09B2 BENGALI LETTER LA (U+09B0 BENGALI LETTER RA is not used). And the letter U+09F1 BENGALI LETTER RA WITH LOWER DIAGONAL sorts between U+09B2 BENGALI LETTER LA and U+09B6 BENGALI LETTER SHA. Thus, sorting on code points is not correct.

2. Discussion

: TDIL proposes to deprecate the existing characters, and to re-encode them so that code point sorting works:

p9B1 BENGALI LETTER RA WITH MIDDLE DIAGONAL
 * used in Assamese
p9B5 BENGALI LETTER VA WITH MIDDLE DIAGONAL
 * used in Assamese and Manipuri

It is a long standing position of the Unicode standard that sorting by code point order is not a viable goal.

Furthermore, it is far more damaging to the standard to move (or deprecate/reencode) characters than to have to use a more or less sophisticated sorting strategy, such as the UCA.

The default collation table for the UCA already sorts Assamese correctly. From http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt:

 09AF  ; [.15C9.0020.0002.09AF] # BENGALI LETTER YA
 09DF  ; [.15C9.0020.0002.09AF][.0000.00FD.0002.09BC] # BENGALI LETTER YYA; QQCM
 09B0  ; [.15CA.0020.0002.09B0] # BENGALI LETTER RA
 09F0  ; [.15CB.0020.0002.09F0] # BENGALI LETTER RA WITH MIDDLE DIAGONAL
 09B2  ; [.15CC.0020.0002.09B2] # BENGALI LETTER LA
 09F1  ; [.15CD.0020.0002.09F1] # BENGALI LETTER RA WITH LOWER DIAGONAL
 09B6  ; [.15CE.0020.0002.09B6] # BENGALI LETTER SHA

Document History

RevisionDateComments
1August 31, 2004

Initial version