Re: CLDR and locale designations (was: [OT] Even viruses are now i18n!)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Apr 23 2004 - 10:49:04 EDT

Next message: Edward H. Trager: "Re: Unihan.txt and the four dictionary sorting algorithm"

Previous message: Jon Hanna: "RE: [OT] Even viruses are now i18n!"
In reply to: Antoine Leca: "Re: [OT] Even viruses are now i18n!"
Next in thread: Marco Cimarosti: "RE: [OT] Even viruses are now i18n!"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Antoine Leca" <Antoine10646@leca-marti.org>
> > Never forget that language codes and country/territory codes are
> different...
>
> We were speaking about ccTLD. A different beast. Try to resolve ANYTHING.GB.
> on a root server, or alternatively to seek UK in ISO 3166, to understand
> what I mean.

I'm not speaking about ccTLD too... but a domain name ending in .gb or .fx could
be valid if there's some DNS record with them. ccTLDs inherit from lagacy
assignments by IANA, but even today, the IANA and RIR databases contain
references to both the GB and UK country/territory codes.

Look precisely into ISO 3166, and you'll see that both [UK] and [GB] are
reserved even if only GB is assigned. You'll see other entries used by ITU (such
as [EA] for Ceuta and Mellila, two small Spanish dependencies in Morrocco, with
a status similar to Gibraltar, a British dependancy in Morocco which has an
assignment in ISO 3166; look also for [DG] which is used by ITU for Diego
Garcia, despite it is part of the British Territories in the Indian Ocean with
ISO 3166 code [IO])

ISO 3166 has its imperfections, but at least it contains enough references to
reserve all codes used in IANA and ccTLDs, but also for some non-territory codes
used for groups of countries in WIPU...

Now when you see that softwares actually rarely need country/territory codes for
their internationalization, but rather would need some code to differentiate
scripts and script variants (such as between Latin and Cyrillic Serbian, or
between Traditional and Simplified Chinese, and you'll see the caveats
introduced in internationalized softwares when one needs to set its locale code
to zh_TW to refer to Traditional Chinese, even if this is needed to address
language variants used in other areas than Taiwan). Which code must be used to
create resources in Serbian Cyrillic? [sh_YU], [sh_CS], [sr_CS] ? How can we
avoid the confusion with Latin script versions?

In fact the problem is not in ISO 3166, but in ISO 3066 for the designation of
locales. This comes from imperfections in the ISO 639 standard, which has lots
of difficulties to encode languages... And even more when it needs to make
distinctions between languages written with several scripts (thanks now we have
codes for scripts, maintained by Unicode, but there's currently no support for
them in locale identifiers...)

Country/territory codes are too much instable to correctly tag the language used
in documents and applications, but the combination of ISO 639 and 3166 is for
now the only widely supported alternative. So within locales, the ISO 3166
country/territory code has lost its initial function to designate a territory.
Instead it designates some language variants.

I Also think about the case of Norwegian [no] which has two major variants:
Bokmål for the traditional "book" orthograph and Nynorsk for the reformed "new"
language; in ISO 639 we find new codes [nn] for Nynorsk and [nb] for Bokmål.
Imagine the complication for softwares that should run with a Norwegian UI.
Which code should be used?

We also find [ax] for the Åland variant of Swedish spoken in Åland islands [AX]
a dependancy of Finland [FI]. Some softwares assume incorrectly that this
language is Finnish when it is in fact a variant of Swedish [sv]. Should
softwares use [sv] or [ax]? Some softwares have chosen to use [sv_FI] to refer
to the Åland language, because it is really the Swedish language spoken in a
part of Finland.... How can those rules be infered in a locale-aware software or
system?

Next message: Edward H. Trager: "Re: Unihan.txt and the four dictionary sorting algorithm"
Previous message: Jon Hanna: "RE: [OT] Even viruses are now i18n!"
In reply to: Antoine Leca: "Re: [OT] Even viruses are now i18n!"
Next in thread: Marco Cimarosti: "RE: [OT] Even viruses are now i18n!"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Apr 23 2004 - 11:38:35 EDT