Language subtags (was: RE: [OT] Reusing the same property)

From: Doug Ewell <doug_at_ewellic.org>
Date: Thu, 01 Sep 2011 10:31:47 -0700

Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

> 2011/8/31 Doug Ewell <doug_at_ewellic.org>:
>> Philippe Verdy wrote:
>>> the existing
>>> BCP 47 implementations, but that would limit the may-be future
>>> extension of ISO 639 to longer codes): ISO 639 could immediately say
>>> that it will never allocate any language code (of any length)
>>> starting by qa..qz.
>>
>> Not possible; 'qu' is already taken for Quechua. And not necessary:
>> 'qaa' through 'qtz' are reserved.
>
> I said using the prefixes starting by "qa..qt",

This was a direct quote from your post of Wed, 31 Aug 2011 21:58:25
+0200 (http://www.unicode.org/mail-arch/unicode-ml/y2011-m08/0456.html).
 But I'll assume it was a typo for "qa..qt"; you did mention this
shorter range in other posts.

> these prefixes are not
> supposed to be used alone, there must be additional letters. so this
> does not apply to "qu" alone (yes, assigned to the Quechua
> macrolanguage, or isn't it a language subfamily ?).

 From here on, I assume you are asking about BCP 47, not about any part
of ISO 639. BCP 47 uses the IANA Language Subtag Registry, which uses
ISO 639 as a primary source, but adds constraints. As one example, when
an alpha-2 code element (from 639-1) exists for a given language, a BCP
47 subtag exists only for that alpha-2 code element and not for the
corresponding alpha-3 code element. So for French, you can only use
'fr' for French and not 'fre' (from 639-2/B) or 'fra' (from 639-2/T and
639-3).

BCP 47 language subtags do not have "prefixes." An ISO-based language
subtag is either 2 or 3 letters long. There is no correlation, explicit
or implicit, between a 2-letter language subtag and any 3-letter subtags
that begin with those same two letters.

ISO 639-3 does classify Quechua as a macrolanguage, but that doesn't
affect code allocation; macrolanguages are assigned code elements and
subtags just like any other language. It is often useful to be able to
specify, say, "Quechua" in a tag instead of one of the many specific
varieties of Quechua, such as Chimborazo Highland Quichua or Yanahuanca
Pasco Quechua; this in fact is why the concept of "macrolanguage"
exists.

> But I admit that there's an additional caveat: BCP47 opens all codes
> with 5 to 8 characters to possible registration in the IANA registry.
> I have not checked if there were some registration of language tags
> starting by "qa..qt" in the IANA registry, but there's apaprently no
> policy defined to forbid such registration.

BCP 47 language subtags of 2 and 3 letters correspond to code elements
assigned in some part of ISO 639.

ISO 639-1, as stated earlier, has assigned 'qu' to Quechua. This is
reflected in the Registry. I don't have a copy of 639-1 and don't know
if it reserves 'qa..qt' or any other range. The 639-2 Web site, which
lists 639-1 allocations, doesn't mention any such reservation.

ISO 639-2 and 639-3 have defined 'qaa' through 'qtz' as "Reserved for
local use," which is reflected in the Registry as "Private use." BCP 47
explains the use of these subtags as an alternative to the "x-"
mechanism. One advantage, as you pointed out elsewhere, is that the
resulting tag can be parsed like a normal tag; the region 'ZW' in
"qaa-ZW" explicitly means Zimbabwe.

ISO 639-2 has assigned 'que' to Quechua and ISO 639-5 has assigned 'qwe'
to "Quechuan (family)." ISO 639-3 has assigned more than 50 code
elements in the non-private range beginning with 'q', many of which (but
not all) are for varieties of Quechua. The Registry reflects all of
these assignments except 'que' (because Quechua in BCP 47 is 'qu').

> And your "not necessary" comment does not apply here too: it just
> assigns the 3-letter codes for local use, not the longer codes which
> are only reserved for the 4-letter codes, but not assigned for private
> use (and there's also no provision given in ISO 649 to protect an
> encoding space for 5-letter codes or longer, as they are now usable
> for IANA registration).

4-letter language subtags are reserved, and will remain reserved unless
and until BCP 47 is updated (via a new RFC) to make use of them. I
don't care to speculate on their future allocation or use.

If and when language subtags of 5 to 8 letters are registered, there
will be no restriction (as far as I can tell) on subtags beginning with
'q' or any other letter or sequence.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­
Received on Thu Sep 01 2011 - 12:36:42 CDT

This archive was generated by hypermail 2.2.0 : Thu Sep 01 2011 - 12:36:49 CDT