RE: is there any way to change already defined character codes?

From: Marco.Cimarosti@icl.com
Date: Tue Aug 08 2000 - 07:49:52 EDT


Sandro Karumidze wrote:
> The issue is that in Unicode there is a sequence of Georgian
> caracters different
> from what this people think should be.
> [...] In beginning of this century 5 characters were dropped
> [...]
> In Unicode this 5 characters follow 33. There is a different
> point of view that those 5 should be included among the
> ohters.

(You definitely need an official reply, but let's go on with some more
informal chatting.)

I foresee that this would not be considered a good reason to change
anything.

The order of characters in Unicode (or in any other character encoding) is
not important. The scope of a character set is to assign a unique number to
each character, not to define an "alphabetical order".

If you notice, the situation that you describe is true for *all* the
alphabets in Unicode.

E.g., if you look at the Latin part, you see that the 26 letters used in
modern English are all contiguously ordered in two areas: U0041 to U005A
(uppercase) and U0061 to U007A (lowercase).

But that's the end of the story! All the other 100's Latin letters are
scattered all over, using no consistent order.

The same is true for Cyrillic, Greek, Hebrew, Arabic, and so on. Have a look
at those blocks: the basic letters for post-czar Russian, modern Greek,
Israeli Hebrew, modern Arabic etc. are consistently ordered, but the letters
for other languages that use the same alphabets (or ancient letters for the
same languages) are scattered all over with no specific order.

The reason why no one cares about the order of characters is that it is
*impossible* to determine a "correct" order.

In alphabet used by more than one language (e.g. Latin, Cyrillic, Arabic,
Devanagari, etc.), the alphabetic order is normally different for each
language.

Moreover, many languages have more than one alphabetic order, all equally
valid and in current usage.

For this reason the problem of "alphabetic order" has been pulled apart from
character sets, and addressed separately.

In Unicode, the issue of "collation" is handled by ad-hoc optional
algorithm, that is part of the standard but is separated from the encoding
issue itself.

The algorithm is titled "Unicode Technical Report #10: Unicode Collation
Algorithm", and you can find it here:
http://www.unicode.org/unicode/reports/tr10/ .

*That* is the place to check whether Georgian Letters are in the correct
order or not. And if they are not, you have two options:

1) Ask Unicode to change it: here you *do* have some chances to be listened,
if you have valid arguments.

2) Change it yourself: unlike the character values, the collation algorithm
is designed to be flexible and customizable.

Regards,
_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT