Re: New RFC 4645-4647 (language tags)

From: Doug Ewell (
Date: Mon Sep 11 2006 - 08:58:39 CDT

  • Next message: Philippe Verdy: "Re: New RFC 4645-4647 (language tags)"

    Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

    > Note that the IANA website still does not refer to the new RFCs for
    > the "Language Tags" category on
    > (it still refers to the draft "RFC-ietf-ltru-registry-14.txt", instead
    > of RFC4645)

    Remember that the RFCs were only published three days ago. Three days
    is not a long delay for IANA to update their site. :-) I'll let them

    > The legacy registry files are however updated to read "OBSOLETE":
    > * Language Tags - OBSOLETE
    > * Language Tags Directory - OBSOLETE

    This has been the case for many months now, as the new procedures were
    in place but without RFC numbers.

    > The new "Language Tag Extensions Registry" is still empty today (just
    > a date line).
    > It seems that there are already many <langext> subtags used (notably
    > for Chinese and Arabic spoken dialects, unless they are considered
    > variants, ordered after the script, something that I think is
    > inappropriate, given that any trans-script, other than Han and Arabic
    > scripts respectively, requires knowing the dialect to select a
    > significant language, even if those languages are unified in the most
    > common script; I would see variants used in Han mostly for making
    > distinctions between competing transcription systems or standards).

    Before you go any further, stop and read Section 3.7 of RFC 4646, which
    pertains to extensions. They are not intended for simple variations in
    language or script.

    Dialectical differences like the ones you are thinking of will be
    handled by "extended language" (not "extension") subtags in the ISO
    639-3-aware replacement for RFC 4646, currently under development.

    > And there are important missing comments for the source of information
    > (which standard number?) in subtags for languages and regions,
    > according to the rules defined in RFC4645. I think that these source
    > standard should be specified to provide the rule underwhich the
    > subtags were registered (no need to specify the version date of these
    > standards, given that this should be OK for the date of registration),
    > but it will ease tracking the dependencies of the registered subtags
    > with those standards and how this affects the stability of the
    > registry (just consider now the case of Serbia, and of Montenegro,
    > separate countries for which there's still no separate assignment,
    > except possibly the UN M.49 region numbers)

    There is no real confusion over the source of most subtags. At present:

    * 2-letter language subtags can only come from ISO 639-1.
    * 3-letter language subtags can only come from ISO 639-2.
    * Script subtags can only come from ISO 15924.
    * 2-letter region subtags can only come from ISO 3166-1.
    * 3-digit region subtags can only come from UN M.49.
    * Other subtags are registered directly through ietf-languages.

    These rules are stated in both RFC 4645 and 4646, and it's not expected
    they will need to be repeated in the Registry for every subtag.

    Doug Ewell
    Fullerton, California, USA
    RFC 4645  *  UTN #14

    This archive was generated by hypermail 2.1.5 : Mon Sep 11 2006 - 09:00:51 CDT