Re: Serbian-Latin "sh" alias and ISO-639-1 within CLDR

From: Jukka K. Korpela (
Date: Mon Mar 14 2005 - 11:32:29 CST

  • Next message: Philippe Verdy: "Re: Serbian-Latin "sh" alias and ISO-639-1 within CLDR"

    On Mon, 14 Mar 2005, Philippe Verdy wrote:

    > I have just seen in the CLDR repository a reference to the 2-letter code
    > "sh" used as an alias for the Serbian language with the Latin variant.

    The code "sh" was assigned to Serbo-Croatian. It was deprecated
    2000-02-18 in favor of the codes "sr" for Serbian, "hr" for Croatian.
    I suppose the political issues behind this are widely known.
    As far as I can see, "sh" was a code for Serbo-Croatian irrespective of
    the writing system (script).

    > According to ISO-639-1, "sh" does not seem assigned, but it may be still an
    > interesting code for software localization purpose, because using "hr"
    > (Croatian) for handling the Serbian vocabulary which shares the same Latin
    > script does not seem appropriate, and using "sr" is already needed for
    > localizing software to traditional Serbian Cyrillic.

    For new data, "hr" and "sr" are to be used, and they indicate language
    forms, not necessarily implying a writing system. When Serbian is written
    in Latin letters, then the script can be specified separately, instead of
    encoding it into the primary language code.

    > So, what is the status of this "sh" language code? Is that just used in
    > CLDR?

    I'm afraid it might be removed from CLDR data as well. That would be a
    mistake however. I think that the code, no matter how deprecated for new
    data, should be recognized in legacy data, and possibly rendered using a
    localized string, such as "Serbo-Croatian". Continuity is important, even
    for codes that might be regarded as deprecated, obsolete, and incorrect.

    For example, if a bibliographic entry in a library data base contains
    information that says that some book or other document has code "sh"
    assigned to it, this fact should be recognized even if we think that "sh"
    should never be used for any data. It _has_ actually been used, and for
    all that we might know, we might be unable to decide whether "sr" or "hr"
    would now be a better code.

    Similar arguments apply to continued recognition, and possibly
    localization, for territory codes that designate territories that no
    longer exist as political entities. Here the argument is even stronger,
    since changing the codes would be incorrect and anachronistic. If a book
    was published in the Soviet Union, for example, its metadata should say
    exactly this, without replacing the country code by one that designates a
    contemporary political entity.

    Jukka "Yucca" Korpela,

    This archive was generated by hypermail 2.1.5 : Mon Mar 14 2005 - 11:33:10 CST