Re: Serbian-Latin "sh" alias and ISO-639-1 within CLDR

From: Mark Davis (mark.davis@jtcsv.com)
Date: Wed Mar 16 2005 - 12:27:13 CST

Next message: Philippe VERDY: "Re: Decomposition vs Full decomposition?"

Previous message: Antoine Leca: "Re: French accented characters - observations of problems"
In reply to: Mark Davis: "Re: Serbian-Latin "sh" alias and ISO-639-1 within CLDR"
Next in thread: Philippe VERDY: "Re: Re: Serbian-Latin "sh" alias and ISO-639-1 within CLDR"
Maybe reply: Philippe VERDY: "Re: Re: Serbian-Latin "sh" alias and ISO-639-1 within CLDR"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> >Let me try this one more time. "sh" was fairly widely used to stand for
> >Serbian written in Latin.
>
> What?
> Where? When? By whom?

Let me be a bit more explicit. 'sh' does mean Serbo-Croatian, and nothing we
are doing denies that. The issue revolves around the different usages of
language tags.

1. Matching. Use of 'sh' says that you want to match Serbo-Croatian of
whatever form: doesn't depend on the country, doesn't depend on the script.
Use of, say, sh-Latn-IT means that you want Serbo-Croatian, but limited to
Latin script, and as used in Italy. If I want to have a document query, then
I could use one of the above to restrict the documents that match my query.
I could get back 1000 documents or none, according to what database of
documents I am searching and how restrictive my query is.

2. Lookup. When you lookup language data, for example, for display on a web
page, it is a different process. You typically don't have the choice of
displaying nothing if there is not an exact match. Instead, you fall back.
If you don't have data exactly matching sh-Latn-IT, you fallback to data for
sh-Latn. If you don't have that either, you fallback to data for sh. Now,
whatever data contents someone has associated with 'sh', it has to be a
single consistent type of data, so it will be in one of Latn or Cyrl. What
CLDR does-- for the contents of the data associated with 'sh' -- is use
Serbian data in Latin script.

This is no different than, for example, the use of say American English in
data associated with en for lookup purposes. Any distinctions according to
country would be stored separately in an en-AU, en-CA, en-IE, etc., and if
available, would be found in a lookup. But if someone came in with en-JP,
and there was no separate data source for that, it would fall back to the
data associated with 'en', which would be American English. That doesn't
imply that CLDR is treating 'en' as equivalent to 'en-US' in terms of the
semantics of the tags -- it is not.

‎Mark

----- Original Message -----
From: "Mark Davis" <mark.davis@jtcsv.com>
To: "Unicode Discussion" <unicode@unicode.org>; "Michael Everson"
<everson@evertype.com>
Sent: Monday, March 14, 2005 22:37
Subject: Re: Serbian-Latin "sh" alias and ISO-639-1 within CLDR

> Well, reality appears to be rather fluid. Mysteriously the single language
> Serbo-Croatian suddenly split into two languages about ten years ago. We
may
> somedy look back on on the day when the Californian language split off
from
> English after the War of Pacific Secession.
>
> ‎Mark
>
> ----- Original Message -----
> From: "Michael Everson" <everson@evertype.com>
> To: "Unicode Discussion" <unicode@unicode.org>
> Sent: Monday, March 14, 2005 18:08
> Subject: Re: Serbian-Latin "sh" alias and ISO-639-1 within CLDR
>
>
> > At 18:00 -0800 2005-03-14, Mark Davis wrote:
> > >Let me try this one more time. "sh" was fairly widely used to stand for
> > >Serbian written in Latin.
> >
> > What?
> > Where? When? By whom?
> >
> > "sh" was used to tag tens or hundreds of thousands of books worldwide
> > in "Serbo-Croatian", which means Serbian or Croatian, in Latin or
> > Cyrillic, for DECADES. There are far more many examples of hr-Latn
> > and sr-Cyrl that were tagged as sh than there are either of hr-Cyrl
> > or sr-Latin.
> >
> > >We do not defend that usage, but for backwards compatibility we've
> > >maintained it in CLDR. Our recommendation, as I have stated, is to
> > >use sr-Latn instead of "sh" for that usage.
> >
> > That particular recommendation seems to have little to do with reality.
> > --
> > Michael Everson * * Everson Typography * * http://www.evertype.com
> >
> >
>
>
>

Next message: Philippe VERDY: "Re: Decomposition vs Full decomposition?"
Previous message: Antoine Leca: "Re: French accented characters - observations of problems"
In reply to: Mark Davis: "Re: Serbian-Latin "sh" alias and ISO-639-1 within CLDR"
Next in thread: Philippe VERDY: "Re: Re: Serbian-Latin "sh" alias and ISO-639-1 within CLDR"
Maybe reply: Philippe VERDY: "Re: Re: Serbian-Latin "sh" alias and ISO-639-1 within CLDR"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Mar 16 2005 - 12:28:00 CST