Re: Language identifier proposals request

From: Keld J|rn Simonsen (keld@dkuug.dk)
Date: Sun Sep 03 1995 - 07:16:32 EDT


unicode@Unicode.ORG writes:

> [elided]
>
> Asmus> To summarize: Any proposal needs to address these issues
>
> Asmus> - how the ID is designed (numeric, string, etc.)
> Asmus> - how one can tell from the id that 2 languages are substitutable
> Asmus> - how the ID is incorportated into a data stream (default
> Asmus> protocol)
> Asmus> - suggested initial assignments of ID values
>
> Asmus, thanks for the description.
>
> What was going to be our proposal to the UTC will now simply be
> submitted to the mailing list after an internal review. Rick McGowan
> kindly reminded me that language identification is not really within
> the scope of the Unicode Standard.
>
> Our approach details three of the four points you mentioned above,
> but doesn't really discuss "substitutability" per se. That kind of
> information can easily be encoded in our approach.
>
> I can see the neccessity of "substitutablility"; particularly in the
> context of many commercial systems that provide language support in
> modular form.

There is discussion within ISO and CEN on how to do this
"language substitutability". As presented here it is related
to what happens when messages for a locale
is not available, for example when Danish messages are not available
then a user might want to use the Norwegian messages instead,
then Swedish, then English and then German. What we are considering
is using a kind of "locale path" to locate the most suitable
locale. Danish, Norwegian, Swedish, English and German all
are part of the same family of languages, but in different
proximities, and the proximitiy is dependent on the user and
his/her abilities. But normally each of these languages
are considered distinct and not just substituteable.

The language codes from ISO 639 I distributed earlier is normally
extended with a country code form ISO 3166, so you can get
"American English" or "British English" by saying resp. en_US or
en_GB (in POSIX locale notation - a similar notation is being
proposed in the Internet).

I think that if Unicode proposes a new standard on this they should
be aligned as much as possible with ISO standards.

Keld



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:32 EDT