Re: is there any way to change already defined character codes?

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Tue Aug 08 2000 - 07:14:53 EDT


Sandro,

Are you basically wanting the ordering to be different?

Unicode does not have any expressed or implied warranty that the ordering of
characters will be anything like what a user would expect (how can it, when
even so many languages that use the same scripts have entirely different,
occasionally conflicting, collation rules?

It is up to the software to make the necessary collation rules happen.

For example, in Windows 2000 there are two different sorts supported for
Georgian: "modern" and "traditional." The difference is that modern has four
letters (He, Hie, We, and Har, both Capital and Small) sort at the end of
the alphabet (which I presume corresponds to the sort that you do not
like?), while the traditional sort has:

* He appearing between Zen and Tan
* Hie appearing between Nar and On
* We appearing between Un and Phar
* Har appearing between Xan and Jhan

I presume the above "exceptions" more closely match the sort you would
expect? And if there are more, this would be very valuable information (as
the rules behind all new "sorts" like this are that a valid need to sort
text differently was identified.

As a rule, Unicode order is not intended to be nor does it explicitly decide
to follow any kind of collation rules for code point order.

FWIW, the LCIDs behind these two sorts under Windows 2000 (used in the C
CompareString and the VB StrComp) are:

Traditional: 1079 (0x0437)
Modern: 66615 (0x10437)

michka

----- Original Message -----
From: "Sandro Karumidze" <sandro@osgf.ge>
To: "Unicode List" <unicode@unicode.org>
Cc: "Unicode List" <unicode@unicode.org>
Sent: Tuesday, August 08, 2000 3:26 AM
Subject: Re: is there any way to change already defined character codes?

> Dear Chris,
>
> Thank you for your answer.
>
> > May I ask what is the reason these people from the government of Georgia
want
> > to change the codepoints of some Georgian characters? There is probably
another
> > good solution (or solutions) for whatever problem they think would be
solved by
> > changing encoding points.
>
> The issue is that in Unicode there is a sequence of Georgian caracters
different
> from what this people think should be.
>
> In modern Georgian there are 33 widely used characters. However before
there were
> 38 characters. In beginning of this century 5 characters were dropped,
though still
> used in old texts and by language specialists.
>
> In Unicode this 5 characters follow 33. There is a different point of view
that
> those 5 should be included among the ohters.
>
> This is all the issue - there are no specific implementation difficulties
or
> problems. The only point is that 5 among the rest 33 is more "correct".
>
> Best regards,
>
> Sandro Karumidze
>
>
>
>
>
> >
> > Regards
> >
> > - Chris
> >
> > "Sandro Karumidze" <sandro@osgf.ge> wrote:
> >
> > > There are people from the government of Georgia interested in
possibility in
> > > altering Unicode standard it terms of changing codes for some of
Georgian
> > > characters.
> >
> > > Does this type of things happen in Consortium and if yes under what
> > circumstances.
> >
> > > If not can you specify in which rules is it defined that this types of
> > changes are
> > > not allowed..
> >
> > > Thanks in advance for your support,
> >
> > > Best regards,
> >
> > > Sandro Karumidze
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT