From: Philippe Verdy (firstname.lastname@example.org)
Date: Mon Mar 14 2005 - 13:40:44 CST
From: "Jukka K. Korpela" <email@example.com>
> On Mon, 14 Mar 2005, Philippe Verdy wrote:
>> I have just seen in the CLDR repository a reference to the 2-letter code
>> "sh" used as an alias for the Serbian language with the Latin variant.
> The code "sh" was assigned to Serbo-Croatian. It was deprecated
> 2000-02-18 in favor of the codes "sr" for Serbian, "hr" for Croatian.
> I suppose the political issues behind this are widely known.
> As far as I can see, "sh" was a code for Serbo-Croatian irrespective of
> the writing system (script).
>> According to ISO-639-1, "sh" does not seem assigned, but it may be still
>> interesting code for software localization purpose, because using "hr"
>> (Croatian) for handling the Serbian vocabulary which shares the same
>> script does not seem appropriate, and using "sr" is already needed for
>> localizing software to traditional Serbian Cyrillic.
> For new data, "hr" and "sr" are to be used, and they indicate language
> forms, not necessarily implying a writing system. When Serbian is written
> in Latin letters, then the script can be specified separately, instead of
> encoding it into the primary language code.
Unfortunately, script selection is not available in many localization APIs
(at least in Java which just considers locale fields for:
- language code, according to ISO-639-1 or -2
- country/region code, according to ISO-3166 (but with lots of caveats
because of its instability and the act that if it is used to differenciate
languages/scripts then it looses its ability to designate the effective
country/region (see for example zh_TW used to designate in fact Traditional
Chinese, whever it is used in mainland Southern China, Hong Kong, or
- variant code, which obeys to no standard, and just used to tweak resources
in non interoperable ways
I expect that the future ISO locale code standard will not only standardize
the new form of locale codes, but a *working* API or algorithm to correctly
match locales in all their aspects: linguistic, orthographic (script), legal
(countries)... Parsing locale codes should not require manual tweaks in
every application, notably one should be able to set a user locale that
would work independantly of the target application that would use it. I am
really not satistified with the two simplistic algorithm present for now in
If "sh" is effectively deprecated, this alias in CLDR may simplify the
distinction between Serbian Cyrillic (sr) and Serbian Latin (sh), leaving
Bosnian Latin with its code (bs), as well as Croatian (hr), and without
needing to manage script codes...
I am much less concerned about the legacy use of "sh" which was ambiguous
(was Serbo-Croatian labelled with "sh" really Latin in fact?) and does not
seem to conflict to a more precise use of this code for modern applications
that need a distinction between the two scripts used for Serbian... as a
transitory measure, the alias has its utility because it helps
This archive was generated by hypermail 2.1.5 : Mon Mar 14 2005 - 13:41:24 CST