Re: lowercased Unicode language tags ? (was:ISO 15924)

From: John Cowan (
Date: Sun May 02 2004 - 21:36:41 CDT

Philippe Verdy scripsit:

> Not really: Many ISO 3166-3 codes (for former countries or territories
> or those that have changed their code) are also 4 letters.
> For example ZRCD designates the former Za´re (now Dem. Rep. of Congo),
> DDDE the former Dem. Rep. of Germany (now unified with Germany),
> BUMM is the former Kingdom of Burma (now.Myanmar).

Well, those codes really code transitions, not countries: they are
structurally a pair of 2-letter 3166-1 codes, saying that what was
once ZR is now CD, what was once DD is now (part of) DE, and what
was once BU is now MM.

> And there are also ISO 3166-2 codes for administrative regions in
> countries (such as FR2B for the department of Haute-Corse in France).

I think those are usually written FR-2B, though I do not have access
to 3166-2 itself.

> Languages need not only distinctions by countries but also by regions
> in countries, if this is needed. So Catalan in the Spanish Canaries
> would use the ISO3166 code "ESCI" after the language tag "es" (the
> complete code would be "es-Latn-ESCI" or just "es-ESCI", distinct from
> "es-Latn" which could be used also for Castillan.

Catalan is not Spanish, and has its own code. RFC 3066 permits registration
of sub-country codes if needed, but they must be registered explicitly
to be used. The proposed replacement, RFC 3066bis, does not yet
allow sub-country codes.

> Lettercase can make a difference here to differentiate a script and
> a region code. Suppose that there's a ISO3166-2 code "LATN" (a region
> code "TN" in Lao?), how will you interpret "lo-LATN"?

It's not preregistered, so it can only be interpreted by looking at the
RFC 3066 registration list, which does not have it.

Here lies the Christian,                        John Cowan
judge, and poet Peter,        
Who broke the laws of God             
and man and metre.            

This archive was generated by hypermail 2.1.5 : Fri May 07 2004 - 18:45:25 CDT