Re: The "QU" territory/region code (was New Public Review Issue: #116 Proposed Update UTS #35 LDML)

From: Mark Davis (mark.davis@icu-project.org)
Date: Wed Nov 07 2007 - 18:10:21 CST

  • Next message: Mark Davis: "Re: The "QU" territory/region code (was New Public Review Issue: #116 Proposed Update UTS #35 LDML)"

    This is a misreading of the text. One of the reasons for the last revision
    of BCP 47 was to make it absolutely clear when codes were valid or not. The
    valid codes are all and only those that are in
    http://www.iana.org/assignments/language-subtag-registry. EU is not there
    (as a region), thus it is not valid.

    Now, personally, I agree with you that not having EU in BCP 47 is
    unproductive. However, there was unfortunately no consensus in the working
    group to add the exceptionally reserved codes. That's what forced us to use
    QU in the first place in CLDR. If BCP 47 ever changes, we'd be pleased as
    punch to deprecate QU in its favor.

    (In retrospect, I also think it was unproductive to have incomplete M49
    codes (http://unstats.un.org/unsd/methods/m49/m49regin.htm). It wouldn't
    hurt to add the very few M49 codes that are excluded from BCP 47, even if
    deprecated, if only to make testing and cross mapping easier.)

    Mark

    On 11/7/07, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
    >
    > Rick McGowan wrote:
    > > The Unicode CLDR committee is planning to release a minor version, 1.5.1
    > ,
    > > by the end of November. There are a few changes in the specificiation
    > > associated with this change.
    > >
    > > http://unicode.org/draft/reports/tr35/tr35.html
    > > Notable changes include:
    > > * Added C10. Likely Subtags for locale IDs or language tags.
    >
    > One problem about a "private use" territory code currently used (QU):
    >
    > [quote[TR35]]
    > 3. Identifiers
    > (...)
    > A locale ID is an extension of a language ID, and thus the structure
    > and
    > field values are based on [BCP47]. The registry of data for that
    > successor is now being maintained by IANA. The canonical form of a
    > locale
    > ID uses "_" instead of the "-" used in [BCP47]; however,
    > implementations
    > providing APIs for CLDR locale IDs should treat "-" as equivalent to
    > "_"
    > on input.
    > (...)
    > Locale Field Definitions
    > -------------- ---------- ------------------------------------------
    > Field Allowable Allowable values
    > Characters
    > -------------- ---------- ------------------------------------------
    > (...)
    > territory_code ASCII [BCP47] subtag values marked as Type:
    > letters, region, or any UN M.49 code that doesn't
    > numbers correspond to a [BCP47] region subtag.
    > There are three private use codes defined
    > in LDML:
    > QO Outlying Oceania
    > QU European Union
    > ZZ Unknown or Invalid Territory
    > The private use codes from XA..XZ will
    > never be used by CLDR, and are thus safe
    > for use for other purposes by
    > applications using CLDR data.
    > -------------- ---------- -----------------------------------------
    > [/quote[TR35]]
    >
    > Now let's look at the normative [BCP-47] reference:
    >
    > [quote[BCP-47]]
    > 2.2.4. Region Subtags
    > (...)
    > The following rules apply to the region subtags:
    > (...)
    > 2. All two-character subtags following the primary subtag were
    > defined in the IANA registry according to the assignments found
    > in [ISO3166-1] ("Codes for the representation of names of
    > countries and their subdivisions -- Part 1: Country codes") using
    > the list of alpha-2 country codes, or using assignments
    > subsequently made by the ISO 3166 maintenance agency or governing
    > standardization bodies.
    > [/quote]
    >
    > Note that [BCP47] cites [ISO3166-1] as a source of codes, but it
    > ***forgets
    > to list it in the list of normative references*** at end of the document.
    > It's not very precise about the list being effectively used; it just gives
    > the name of the whole document within the text itself: "Codes for the
    > representation of names of countries and their subdivisions -- Part 1:
    > Country codes", and refers to the "list of alpha-2 country codes"; it
    > speaks
    > about "assignments", but does not indicate the normative status.
    >
    > From there, I can find this official page:
    > http://www.iso.org/iso/iso-3166-1_decoding_table where the "EU" code is in
    > yellow background described as "exceptional reservations". This links to
    > this page:
    > http://www.iso.org/iso/customizing_iso_3166-1.htm, which says:
    >
    > [quote]
    > To avoid transitional application problems and to aid users who require
    > specific additional code elements for the functioning of their coding
    > systems, the ISO 3166/MA may set aside code elements which it undertakes
    > not
    > to use for other than specified purposes during a limited or indeterminate
    > period of time. These are called reserved code elements and their use is
    > normally restricted to the application they were reserved for.
    > (...)
    > Code elements not included in the current version of ISO 3166-1 may be
    > reserved by the ISO 3166/MA,
    > * (...)
    > * as "exceptional reservations", at the request of national ISO member
    > bodies, governments and international organizations. This applies to
    > certain
    > code elements required in order to support a particular application, as
    > specified by the requesting body and limited to such use; any further use
    > of
    > such code elements is subject to approval by the ISO 3166/MA.
    > [/quote]
    >
    > So [BCP47] indicates that the [ISO3166-1] country code "EU", listed in the
    > list of alpha-2 country code for the European Union, should be used as it
    > was reserved for indeterminate time. BCP47 does not seem to restrict the
    > use
    > of alpha-2 codes that were "exceptionally reserved".
    >
    > For [ISO3166-1], the code "EU" is an exception reservation; its use in
    > LDML
    > (if it has to become an international standard) would conform to the
    > needed
    > "support for a particular application". All that is needed is that Unicode
    > requests approval by the ISO 3166/MA.
    >
    > Why is LDML using the private use code "QU", apparently in contradiction
    > with BCP47? Shouldn't it be changed to use "EU" according to BCP47
    > recommandation and the other policy in LDML that warns against the use of
    > private use codes that can be changed at any time?
    >
    > Does Unicode want to request approval by ISO 3166/MA for the use of the
    > "EU"
    > code in LDML and CLDR (as indicated in ISO3166-1)? I think it would be in
    > the interest of many applications that already use "EU" in the
    > localization
    > data, but NOT "QU" because it is a "user-assigned code element" not meant
    > for interchange.
    >
    > Note that [ISO3166-1] also says:
    >
    > [quote]
    > When exchanging data with users of ISO 3166-1 not connected to this
    > particular in-house application the definition of these user-assigned code
    > elements should be given.
    > [/quote]
    >
    > This is what is performed in the LDML specification, but is it enough to
    > permit interchange of data?
    >
    >
    >
    >
    >

    -- 
    Mark
    


    This archive was generated by hypermail 2.1.5 : Wed Nov 07 2007 - 18:13:23 CST