Re: Language subtags (was: RE: [OT] Reusing the same property) from Philippe Verdy on 2011-09-01 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Thu, 1 Sep 2011 20:17:06 +0200

In fact, my post about qa..qz was a 1-letter error, I only wanted to
speak about qa..qt (as I had stated previously and every where else in
this thread), before one wanted to correct me.

Well I know that this is now going out of topic, because someone else
spoke about "www" (before that I had only spoken about the general
need for any encoding open standard that wants to be universal, to
assign a private-use area.

To which there was a desire to have a larger space than just qa..qt
union qaa..qtz, for easier (algorithmic) mapping of local-user codes
(both in ISO 639 and BC 47).

I had also wanted to show that the "x-" prefix in BC47 makes the
language tag not parsable like generic structured tags (that are also
extensible to support locale tags, using extensions such as the one
using the "u" subtag defined by Unicode, mostly for the CLDR, e.g. to
encode collation options, or other locale conventions). Using the
BCP47 "x-" prefix does not permit those extensions, because "x-" BCP47
language tags have no structure.

And that's why I spoke about two alternatives :
- using another singleton letter "q" (yes in BCP 47 only), followed by
one subtag, to create arbitrary local-use language tags, that would
still remain be parsable and would support the "u" extension mechanism
- using ranges of codes starting by qa..qt of arbitrary longer lengths
(not limited to 2-3 letters as now), which means a change both in ISO
639 (for code allocation) and BCP 47 (to restrict 5 to 8-letter codes
that are NOT freely usable for local-use, but still open for
registration, so that the IANA registrey could accept a registration
of 5-8 letters codes starting by qa..qt)

I hope this summary correctly represent what I wanted to show, because
once agin the intent has been misunderstood and some people on this
list were assuming things that I did not intend to request.

In fact I have not requested anything, just spoken about the existing
possibilities, that would permit an application to use a cumfortable
space for its local uses that can easily remap some unusable codes to
a PUA space where it can create aliases that would be recognized
automatically by this local application as such (i.e. an alias of the
standard language code), easing the interoperability of this
application with the rest of the world, even if it needs to use
local-use codes.

Philippe.

2011/9/1 Doug Ewell <doug_at_ewellic.org>:
> Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>
>> 2011/8/31 Doug Ewell <doug_at_ewellic.org>:
>>> Philippe Verdy wrote:
>>>> the existing
>>>> BCP 47 implementations, but that would limit the may-be future
>>>> extension of ISO 639 to longer codes): ISO 639 could immediately say
>>>> that it will never allocate any language code (of any length)
>>>> starting by qa..qz.
>>>
>>> Not possible; 'qu' is already taken for Quechua. And not necessary:
>>> 'qaa' through 'qtz' are reserved.
>>
>> I said using the prefixes starting by "qa..qt",
>
> This was a direct quote from your post of Wed, 31 Aug 2011 21:58:25
> +0200 (http://www.unicode.org/mail-arch/unicode-ml/y2011-m08/0456.html).
> But I'll assume it was a typo for "qa..qt"; you did mention this
> shorter range in other posts.
>
>> these prefixes are not
>> supposed to be used alone, there must be additional letters. so this
>> does not apply to "qu" alone (yes, assigned to the Quechua
>> macrolanguage, or isn't it a language subfamily ?).
>
> From here on, I assume you are asking about BCP 47, not about any part
> of ISO 639. BCP 47 uses the IANA Language Subtag Registry, which uses
> ISO 639 as a primary source, but adds constraints. As one example, when
> an alpha-2 code element (from 639-1) exists for a given language, a BCP
> 47 subtag exists only for that alpha-2 code element and not for the
> corresponding alpha-3 code element. So for French, you can only use
> 'fr' for French and not 'fre' (from 639-2/B) or 'fra' (from 639-2/T and
> 639-3).
>
> BCP 47 language subtags do not have "prefixes." An ISO-based language
> subtag is either 2 or 3 letters long. There is no correlation, explicit
> or implicit, between a 2-letter language subtag and any 3-letter subtags
> that begin with those same two letters.
>
> ISO 639-3 does classify Quechua as a macrolanguage, but that doesn't
> affect code allocation; macrolanguages are assigned code elements and
> subtags just like any other language. It is often useful to be able to
> specify, say, "Quechua" in a tag instead of one of the many specific
> varieties of Quechua, such as Chimborazo Highland Quichua or Yanahuanca
> Pasco Quechua; this in fact is why the concept of "macrolanguage"
> exists.
>
>> But I admit that there's an additional caveat: BCP47 opens all codes
>> with 5 to 8 characters to possible registration in the IANA registry.
>> I have not checked if there were some registration of language tags
>> starting by "qa..qt" in the IANA registry, but there's apaprently no
>> policy defined to forbid such registration.
>
> BCP 47 language subtags of 2 and 3 letters correspond to code elements
> assigned in some part of ISO 639.
>
> ISO 639-1, as stated earlier, has assigned 'qu' to Quechua. This is
> reflected in the Registry. I don't have a copy of 639-1 and don't know
> if it reserves 'qa..qt' or any other range. The 639-2 Web site, which
> lists 639-1 allocations, doesn't mention any such reservation.
>
> ISO 639-2 and 639-3 have defined 'qaa' through 'qtz' as "Reserved for
> local use," which is reflected in the Registry as "Private use." BCP 47
> explains the use of these subtags as an alternative to the "x-"
> mechanism. One advantage, as you pointed out elsewhere, is that the
> resulting tag can be parsed like a normal tag; the region 'ZW' in
> "qaa-ZW" explicitly means Zimbabwe.
>
> ISO 639-2 has assigned 'que' to Quechua and ISO 639-5 has assigned 'qwe'
> to "Quechuan (family)." ISO 639-3 has assigned more than 50 code
> elements in the non-private range beginning with 'q', many of which (but
> not all) are for varieties of Quechua. The Registry reflects all of
> these assignments except 'que' (because Quechua in BCP 47 is 'qu').
>
>> And your "not necessary" comment does not apply here too: it just
>> assigns the 3-letter codes for local use, not the longer codes which
>> are only reserved for the 4-letter codes, but not assigned for private
>> use (and there's also no provision given in ISO 649 to protect an
>> encoding space for 5-letter codes or longer, as they are now usable
>> for IANA registration).
>
> 4-letter language subtags are reserved, and will remain reserved unless
> and until BCP 47 is updated (via a new RFC) to make use of them. I
> don't care to speculate on their future allocation or use.
>
> If and when language subtags of 5 to 8 letters are registered, there
> will be no restriction (as far as I can tell) on subtags beginning with
> 'q' or any other letter or sequence.
>
> --
> Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
> www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell 
>
>
>
Received on Thu Sep 01 2011 - 13:20:06 CDT

This archive was generated by hypermail 2.2.0 : Thu Sep 01 2011 - 13:20:07 CDT