Re: lowercased Unicode language tags ? (was: ISO 15924)

From: Doug Ewell (dewell@adelphia.net)
Date: Mon May 03 2004 - 01:26:40 CDT


Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

> Lettercase can make a difference here to differentiate a script and a
> region code. Suppose that there's a ISO3166-2 code "LATN" (a region
> code "TN" in Lao?), how will you interpret "lo-LATN"?
>
> Is it the Lao language spoken in that particular region of Lao (the
> country), and written with its natural script, or is it "standard" Lao
> written with the Latin script ?

As I mentioned before, this will never happen, because even if an ISO
3166-2 region code did appear in a language tag (by registration, as
John Cowan points out), the country and region would still be separated
by a hyphen. The hypothetical region in Laos would be coded "LA-TN",
and so the whole language tag would be "lo-LA-TN", distinguishable from
"lo-Latn" regardless of capitalization.

There is in fact no such region as LA-TN, but just for fun, I compiled a
list of the codes that would be ambiguous if Philippe's hyphenless
assumption were true. It's not a long list.

CA-NS: Canada, Nova Scotia
Cans: Unified Canadian Aboriginal Syllabics

IT-AL: Italy, Alessandria
Ital: Old Italic (Etruscan, Oscan, etc.)

This might qualify as the first recorded frivolous use of ISO 15924
codes.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/



This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:25 CDT