From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon Mar 14 2005 - 11:32:29 CST
On Mon, 14 Mar 2005, Philippe Verdy wrote:
> I have just seen in the CLDR repository a reference to the 2-letter code
> "sh" used as an alias for the Serbian language with the Latin variant.
The code "sh" was assigned to Serbo-Croatian. It was deprecated
2000-02-18 in favor of the codes "sr" for Serbian, "hr" for Croatian.
I suppose the political issues behind this are widely known.
As far as I can see, "sh" was a code for Serbo-Croatian irrespective of
the writing system (script).
> According to ISO-639-1, "sh" does not seem assigned, but it may be still an
> interesting code for software localization purpose, because using "hr"
> (Croatian) for handling the Serbian vocabulary which shares the same Latin
> script does not seem appropriate, and using "sr" is already needed for
> localizing software to traditional Serbian Cyrillic.
For new data, "hr" and "sr" are to be used, and they indicate language
forms, not necessarily implying a writing system. When Serbian is written
in Latin letters, then the script can be specified separately, instead of
encoding it into the primary language code.
> So, what is the status of this "sh" language code? Is that just used in
> CLDR?
I'm afraid it might be removed from CLDR data as well. That would be a
mistake however. I think that the code, no matter how deprecated for new
data, should be recognized in legacy data, and possibly rendered using a
localized string, such as "Serbo-Croatian". Continuity is important, even
for codes that might be regarded as deprecated, obsolete, and incorrect.
For example, if a bibliographic entry in a library data base contains
information that says that some book or other document has code "sh"
assigned to it, this fact should be recognized even if we think that "sh"
should never be used for any data. It _has_ actually been used, and for
all that we might know, we might be unable to decide whether "sr" or "hr"
would now be a better code.
Similar arguments apply to continued recognition, and possibly
localization, for territory codes that designate territories that no
longer exist as political entities. Here the argument is even stronger,
since changing the codes would be incorrect and anachronistic. If a book
was published in the Soviet Union, for example, its metadata should say
exactly this, without replacing the country code by one that designates a
contemporary political entity.
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Mon Mar 14 2005 - 11:33:10 CST