Re: Ordering of scripts in DUCET?

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Dec 02 2008 - 16:29:48 CST


Harold S. Henry asked:

> Is there a documented logic or specification that governs the ordering of
> scripts in DUCET?

No. Although when the original table was created many years
ago, there were some strong opinions expressed regarding
the required ordering of scripts. The default position had
been to simply follow the block order in the standard
(except of course for each script coming in a chunk in the
table -- so all of Latin would be together in order, even
though the Latin script is split across many blocks in
the standard). Others felt that a roughly geographical
ordering was imperative -- and that in fact accounts for
why Georgian occurs in the script order it does, instead
of later in the table, for example.

But there is no over-all principle that could govern the
ordering of more than 100 scripts with respect to each
other, without running into multiple exceptions and
edge cases. Ask any 5 script experts to logically order
all the Unicode scripts, and I expect you'll get 5 rather
different answers, depending on different principles for
same or different and on what kind of cataloging makes
sense to each of them, personally.

Also I don't think the UTC has judged the relative order
of scripts in the table as significant enough to require
attempting to rationalize the order of new scripts in
terms of a specification. There are relatively few
applications of truly multi-script sorting -- most of what
people actually care about is the ordering of strings
*within* a script -- whether strings in some other script
that a user is unconcerned about occur before or after
those of the script they *do* care about is generally
immaterial.

> In other words, is there a way to predict where future
> scripts will be inserted into the current primary collation order for
> letters?

Nope. In part because there isn't even any way to
predict which scripts *will* be encoded, and in what
timeframe with respect to each other.

If anyone has particular suggestions regarding where a
script currently under ballot for 10646 and for prospective
inclusion in Unicode should reasonably order for a future
UCA version in the DUCET table, they can always make
a suggestion to the UTC, which would take it into consideration
in trying to develop the next delta for the table. Otherwise
the most likely outcome will simply be for it to be
stuck nearby whatever script currently in the table seems
typologically, genetically, and/or geographically most like the
one to be added.

--Ken



This archive was generated by hypermail 2.1.5 : Fri Jan 02 2009 - 15:33:07 CST