Re: Tibetan Collation

From: Christopher Fynn (cfynn@gmx.net)
Date: Sun Dec 14 2008 - 00:06:24 CST


Hi Mark

All modern languages using Tibetan script (Tibetan, Dzongkha, Ladakhi
etc.) use the same ordering rules ~ there are some very minor
differences in the order used in different Tibetan Dictionaries but
these are due to differences in the way particular authors treat edge
cases rather than differences in ordering rules for particular languages.

Dzongkha is slightly more complex than Tibetan in that some words
contain a second "root" stack which require a couple of additional
collation elements to handle. Although these additional elements might
be removed for Tibetan they do not effect the ordering of Tibetan
proper. In effect for collation the ordering rules for Tibetan are a
subset of those for Dzongkha.

Other than these additional cases the ordering in the Dzongkha data is
based on the three volume Great Tibetan-Tibetan-Chinese Dictionary ISBN
81-206-0455-5 which is the closest thing there is to a "standard"
dictionary of Tibetan.

Since the ordering for all languages using the Tibetan script is the
same this is language neutral. The current default order is wrong for
any language using Tibetan.

If you need a confirmation of Dzongkha data I could get this from the
Dzongkha Development Commission which is the *official* government
language body of Bhutan where Dzongkha is the national language.

- Chris

Thimphu, Bhutan

Mark Davis wrote:
> (adding unicode.org <http://unicode.org>, since this might be relevant
> to some there)
>
> The main collation charts are for the UCA, which is a language-neutral
> ordering of all Unicode characters. In the UCA ordering is not
> necessarily correct for any language using any particular script, since
> the complications required to get correct sorting typically have to be
> on a language-by-language basis. You happen to be looking at the chart
> for characters in the Tibetan script using the UCA.
>
> As yet, we don't show collation charts for the language-specific rules
> in CLDR. However, there is no locale data for Tibetan [bo] in any event,
> and we wouldn't add collation data if there isn't at least minimal
> general locale data.
>
> There is locale data for Dzongkha [dz]
> (http://www.unicode.org/cldr/data/common/main/dz.xml), although minimal,
> and a collation sequence for that: see
> http://www.unicode.org/cldr/data/common/collation/dz.xml. The status is
> draft="unconfirmed", because there has not been enough participation.
> CLDR locales work somewhat like an open-source project - locale data is
> developed and enhanced based on the interest of participating
> organizations in doing the work for the particular language. This can be
> Unicode members, but also liaison
> organizations http://www.unicode.org/consortium/memblogo.html#liais and
> others. If you have any further questions, please let me know.
>
> Mark
>
>
> On Sat, Dec 13, 2008 at 06:58, Christopher Fynn <cfynn@gmx.net
> <mailto:cfynn@gmx.net>> wrote:
>
> The current collation chart for Tibetan
> <http://www.unicode.org/charts/collation/chart_Tibetan.html>
> looks *completely* broken as it does not handle prefixes, rago,
> lago, sago, etc.
>
>
> Tibetan should be almost identical to Dzongkha
> e.g. <http://developer.mimer.com/charts/dzongkha.htm>
> which also gives a correct collation for Tibetan.
>
> - Chris



This archive was generated by hypermail 2.1.5 : Fri Jan 02 2009 - 15:33:07 CST