RE: [OT] Reusing the same property (was: RE: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0)

From: Doug Ewell <>
Date: Wed, 31 Aug 2011 13:23:58 -0700

Philippe Verdy wrote:

>> Or you could actually follow BCP 47, and use "x-www" instead.
> No, because locale tags in BCP 47

BCP 47 specifies language tags. They are sometimes used to identify
locales, but that is not their primary use case.

> starting by the "x" singleton
> subtags are not parsable to differentiate a language, a region, and a
> script (as well as other Unicode "u" extensions). They just identify a
> locale as a whole.

Fine, then use a reserved code element (ISO 639) or private-use language
subtag (BCP 47) in the range from 'qaa' to 'qtz'. There's still no need
to change the design.

> Anyway, it's highly probable that if ISO 649 starts allocating
> 4-letter codes, it will also include the space qaaa-qtzaa as a private
> use area (just like the existing spaces qaa-qtz and qa-qt). But let's
> not anticipate what will be in this may-be future extension of ISO
> 639.

No, let's not.

> The "x" singleton space of BCP 47 will then remain for applications
> that don't want to identify specific languages; it is clearly not
> ideal to use this "x" singleton for identifying languages but not for
> the other separable identification properties contained in a locale
> identifier.

I don't know what this means. Private-use tags starting with "x-"
cannot be reliably and algorithmically parsed into subtags (just like
all language tags before RFC 4646), but there are no real limits to what
language information can be conveyed in them, as you seem to imply; you
can write "x-navi-as-spoken-in hometree-on-pandora" if you like.

> There's also another solution (that does not break much

As those of us who worked on 4646 and 5646 know, any architectural
change at this point would chew up years. "Not much" is too much.

> the existing
> BCP 47 implementations, but that would limit the may-be future
> extension of ISO 639 to longer codes): ISO 639 could immediately say
> that it will never allocate any language code (of any length) starting
> by qa..qz.

Not possible; 'qu' is already taken for Quechua. And not necessary:
'qaa' through 'qtz' are reserved.

Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14 | | @DougEwell ­
Received on Wed Aug 31 2011 - 15:26:50 CDT

This archive was generated by hypermail 2.2.0 : Wed Aug 31 2011 - 15:26:51 CDT