Request for change to Eastern European language collations

We have gotten the following feedback on a number of Eastern European languages. Please look them over and if there is any reason not to do them, please let us know as soon as possible.

Remember that the ICU rules (which correspond to CLDR syntax) only list the differences from http://www.unicode.org/charts/collation/. For more on data formats, see http://www.unicode.org/cldr/data_formats.html#Collation.

Croatian - hr_HR

CLDR 1.0

Suggested Change

Comments

dž

Dž

dž

Dž

& C
< č
   <<< Č
< ć
   <<< Ć
& Đ
< dž
   <<< Dž
   <<< DŽ
& L
< lj
   <<< Lj
   <<< LJ
& N
< nj
   <<< Nj
   <<< NJ
& S
< š
   <<< Š
& Z
< ž
   <<< Ž

& C
< č
   <<< Č
< ć
   <<< Ć
& D
< dž
   <<< Dž
   <<< DŽ
& L
< lj
   <<< Lj
   <<< LJ
& N
< nj
   <<< Nj
   <<< NJ
& S
< š
   <<< Š
& Z
< ž
   <<< Ž

1. Changing to D will put dž ahead of Đ instead of behind it.

Romanian - ro_RO

CLDR 1.0

Suggested Change

Comments

...

& A
< ă
   <<< Ă
& D
< đ
   <<< Đ
& I
< î
   <<< Î
& S
< ş
   <<< Ş
& Þ
< ţ
   <<< Ţ
& Z
< ż
   <<<
Ż

& A
< â
   <<< Â
< ă
   <<< Ă
& D
< đ
   <<< Đ
& I
< î
   <<< Î
& S
< ş
   <<< Ş
& T
< ţ
   <<< Ţ
& Z
< ż
   <<<
Ż

1. Changing the order of ă and â. Note: this change will cause â to be treated as a separate letter from a on a primary level, thus producing the following ordering.

a < ax < â < âx, not a < â < ax < âx

Can we verify that that is what is desired?

If so, then we also probably need to add â to the exemplar characters.

2. Changing ţ to be after T, not after Z

Polish -- pl_PL

CLDR 1.0

Suggested Change

Comments

& A
< ą
   <<< Ą
& C
< ć
   <<< Ć
& E
< ę
   <<< Ę
& L
< ł
   <<< Ł
& N
< ń
   <<< Ń
& O
< ó
   <<< Ó
& S
< ś
   <<< Ś
& Z
< ź
   <<< Ź
< ż
   <<< Ż

& A
< ą
   <<< Ą
& C
< ć
   <<< Ć
& E
< ę
   <<< Ę
& L
< ł
   <<< Ł
& N
< ń
   <<< Ń
& O
< ó
   <<< Ó
& S
< ś
   <<< Ś
& Z
< ż
   <<< Ż
< ź
   <<< Ź

1. Change the order of ź and ż

Serbo-Croatian -- sh_YU

CLDR 1.0

Suggested Change

Comments

dž

Dž

dž

Dž

& C
< č
   <<< Č
< ć
   <<< Ć
& Đ
< dž
   <<< Dž
   <<< DŽ
& L
< lj
   <<< Lj
   <<< LJ
& N
< nj
   <<< Nj
   <<< NJ
& S
< š
   <<< Š
& Z
< ž
   <<< Ž

& C
< č
   <<< Č
< ć
   <<< Ć
& D
< dž
   <<< Dž
   <<< DŽ
& L
< lj
   <<< Lj
   <<< LJ
& N
< nj
   <<< Nj
   <<< NJ
& S
< š
   <<< Š
& Z
< ž
   <<< Ž

1. Changing to D will put dž ahead of Đ instead of behind it. (Same as Hr)

Slovenian -- sl_SI

CLDR 1.0

Suggested Change

Comments

& C
< č
   <<< Č
& S
< š
   <<< Š
& Z
< ž
   <<< Ž

& C
< č
   <<< Č
< ć
   <<< Ć
& S
< š
   <<< Š
& Z
< ž
   <<< Ž

1. Add ć

2. In the UCA, đ should already sort after d (primary difference), so this we should take as a request to add as an exemplar character.

Note

The above changes were made on the basis of an Excel Chart that was supplied to us. We have one concern, that it does not accurately list all of the collation rules in ICU. For example, look at Hungarian. The list on the left below is from the Excel spreadsheet. The ICU rules show other combinations that are not listed in the chart, such as DZS, CCS, etc.; sequences which, according to the information we have, behave as contractions in sorting.

(Note also that the ICU rules also explicitly list the strength of the differences also.).

Excel

ICU

& C
< cs
   <<< Cs
   <<< CS
& D
< dz
   <<< Dz
   <<< DZ
& DZ
< dzs
   <<< Dzs
   <<< DZS
& G
< gy
   <<< Gy
   <<< GY
& L
< ly
   <<< Ly
   <<< LY
& N
< ny
   <<< Ny
   <<< NY
& S
< sz
   <<< Sz
   <<< SZ
& T
< ty
   <<< Ty
   <<< TY
& Z
< zs
   <<< Zs
   <<< ZS

& O
< ö
   <<< Ö
  << ő
   <<< Ő
& U
< ü
   <<< Ü
  << ű
   <<< Ű

& cs
   <<< ccs  /  cs
& Cs
   <<< Ccs  /  cs
& CS
   <<< CCS  /  CS
& dz
   <<< ddz  /  dz
& Dz
   <<< Ddz  /  dz
& DZ
   <<< DDZ  /  DZ
& dzs
   <<< ddzs  /  dzs
& Dzs
   <<< Ddzs  /  dzs
& DZS
   <<< DDZS  /  DZS
& gy
   <<< ggy  /  gy
& Gy
   <<< Ggy  /  gy
& GY
   <<< GGY  /  GY
& ly
   <<< lly  /  ly
& Ly
   <<< Lly  /  ly
& LY
   <<< LLY  /  LY
& ny
   <<< nny  /  ny
& Ny
   <<< Nny  /  ny
& NY
   <<< NNY  /  NY
& sz
   <<< ssz  /  sz
& Sz
   <<< Ssz  /  sz
& SZ
   <<< SSZ  /  SZ
& ty
   <<< tty  /  ty
& Ty
   <<< Tty  /  ty
& TY
   <<< TTY  /  TY
& zs
   <<< zzs  /  zs
& Zs
   <<< Zzs  /  zs
& ZS
   <<< ZZS  /  ZS