Public Review Issues

178	Collation Rules for Non-Latin Scripts in Unicode CLDR	2011.05.02
Status:	Closed
Originator:	CLDR-TC
Resolution:	The committee is considering the feedback.

Description of Issue:

In Unicode CLDR 1.9 and earlier versions, the collation order ("alphabetic order") for given languages changes the order of characters within a script, but doesn't change the order of scripts. For example, although Ukranian has a specific order for Cyrillic characters, all Cyrillic characters still come after all Latin characters, as in the default UCA order: Latin Greek Coptic Cyrillic Glagolitic Georgian Armenian ... For the full ordering that illustrates this, see http://www.unicode.org/charts/collation/.

In Unicode CLDR 2.0 and later versions, there is the capability to customize the collation order by re-ordering one or more scripts with respect to other scripts.

The proposal addressed by this Public Review Issue is to change the customized collation order for certain languages in the Unicode CLDR data tables. These data tables are used by software to sort characters according to user expectations for those languages. The languages in question are those that use non-Latin scripts and will be changed so that the so that the "native script(s)" are ordered before Latin. Thus for Russian, the order would be Cyrillic, then Latin, Greek, etc. For Thai, it would be Thai, then Latin, Greek, etc. Any scripts that were not "moved up" would still be in default UCA order.

The CLDR-TC is soliciting feedback on this proposal to see whether the proposed changes would be inappropriate for any particular language. In some cases, general user expectations for a languages may be to have Latin ordered first, or to have Latin followed by the native script(s), followed by all other scripts.

Technically, the required change would consist of adding a script-reordering rule to the collation data for each affected language.

Languages which have simple writing systems would require the addition of one script re-ordering rule. For example, Greek and Serbian (Cyrillic) would require the addition of the following rules, respectively:

el: [reorder Grek]
sr-Cyrl: [reorder Cyrl]

Languages which use multiple scripts would use a re-ordering rule listing more than one script, for example:

sr-Latn: [reorder Latn Cyrl]
ja: [reorder Kana Hani]
ko: [reorder Hang Hani]
zh: [reorder Hani Bopo]

Comments on this public review issue must be submitted using the Unicode CLDR bug reporting form. Please be sure to indicate the number and title of the issue you are providing feedback for, and try to be as explicit as possible in your suggestions.