Re: Eyelash Ra/Variant Mark? (collation issue)

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu May 28 1998 - 13:18:07 EDT


Jeroen noted:

>
> > 2. It will cause difficulty in default sort order, if it is placed at the
> > end of current assignments.
>
> Recently, the a default Unicode collating algorithm was proposed
> in a technical report. If that is going to be implemented (hopefully
> as part of a standard Unicode API) code-points will not be very
> relevant to sorting applications, so this is no objection to me.
>

The Indic scripts are accounted for in the Unicode collating
algorithm, so that the individual code point positions for any
new character are not problematical for culturally correct collation.
That said, of course, because binary order may often be used as
a fallback in simple situations, it is better to keep things in
a reasonable order, rather than arbitrarily assigned.

But I do want to point out that contrary to Jeroen's hope, there is
no likelihood of the UTC promoting a "standard Unicode API" for
collation. The Unicode Technical Committee proposes, reviews and
standardizes algorithms which may be of use in promoting interoperable
implementations of the standard. But API's tend to be vendor-specific,
and are left to the discretion of the vendors themselves, or to
the initiative of others who may wish to develop and distribute
libraries (freeware or for profit) that implement API's related
to the Unicode Standard.

--Ken Whistler



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT