ICU 2.0 Collation charts online

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Fri Nov 02 2001 - 19:32:55 EST


Dear ICU users,

We have generated graphical charts that show the sorting order for many locales with the ICU 2.0 data: http://oss.software.ibm.com/icu/charts/collation/
They are intended to give an easier-to-read overview of the sorting order than the source data (which lives in CVS, in the locale-specific icu/data/*.txt files).

Please take a look at the charts and notify us of any problems, either via email, or, if you are sure that something is wrong, by filing a bug . Please see our Contacts page on http://oss.software.ibm.com/icu/archives/index.html

Please note the following:

- Currently, many more characters are shown in each chart than are actually used in each language. This is because we show entire scripts with all variations. In the future, we will need to collect lists of characters that are actually used in a language in order to show simpler charts.
However, with the complete script charts, you may be able to see peculiarities that might be unintended.

- You need to look at the actual collation weights (fly-over text) for the actual sorting of characters that expand (red coloring). For example, a sharp s (ß) sorts like ss but is shown as primary different from s (just like ss itself is different from s). We do not currently have code for the chart generation that automatically finds that ß is similar to ss and would show a lower-level difference between those.

- All of the collation sequences are based on the Unicode Collation Algorithm table for "sorting everything". This means that many characters of the particular language and all of the characters of other languages follow the UCA order. We have a link to the UCA charts on unicode.org.

Enjoy, and thank you very much for your help,

markus



This archive was generated by hypermail 2.1.2 : Fri Nov 02 2001 - 20:36:21 EST