Re: Questions re ISO-639-1,2,3

From: Philippe Verdy (
Date: Tue Aug 23 2005 - 19:15:55 CDT

  • Next message: J Andrew Lipscomb: "Re: unicode Digest V5 #208"

    From: "JFC (Jefsey) Morfin" <>
    > The problem in using alpha-3 codes is that they are 3 alpha long. An IETF
    > Draft, supported by Doug and Peter, proposes a strict variation of the RFC
    > 3066 ABNF (structured format) where subtags are partly identified by their
    > size, partly by their relative position. I say "variation" because -
    > however it includes some additions (which result in changes in the RFC
    > 3066 ABNF) - it does not want to be an evolution which would permit much
    > needed other changes (IMHO) and support innovation, for reasons I will not
    > discuss here. The use of alpha-3 in that ABNF could be confusing at some
    > stage with other information, all the more than in internet protocols one
    > must not consider the case.
    > (...snip...)

    Hmmm... a long sequence which is not related to what I was speaking about.
    You speak about the identification of languages, I spoke about the
    identification of countries (or territories, or their divisions and
    groupings) and their *later* use in the identification of locales for
    labelling various types of contents (with internal markup or external
    meta-data). In this view, languages are only part of the problem (and are
    not necessarily involved when we speak about identification of "locales").

    It's just related because of an old practice of identifying *some* language
    variants by the geographical area where they are spoken (but this is not
    always the best choice, given that two distinct variants of the same
    language may also cover roughly the same geographical area without clear

    That's why now we see other attempts to use other attributes (notably
    recently the script code).

    Note also that ISO 3166 (or even BCP 47) does not come to help us when we
    need distinctions in locales for territories such as "Channel Islands" in
    the CLDR (this classification made using the UN code is still not good
    enough to make distinctions between the TWO distinct currencies used between
    Jersey and Guernsey bailiwicks, and that have their own local emitting bank
    and own reserves). So if you need separate codes for Jersey and Guernsey,
    will you use the two distinct "exceptionally reserved" codes in ISO 3166-1
    (but then how will you designate the "Channel Islands" collectively?

    Now, like for languages, territories (seen as political or administrative
    geographic delimitations) are certainly as much unstable as languages. There
    is also the other concept of cultural geographic delimitations (named "pays"
    in French) which is distinct from the administrative/political idea of State
    (État) and their administrative divisions (as defined in ISO 3166). For
    language identification, the former concept is certainly better than the
    arbitrary State delimitations and divisions (and especially at this second
    level, when there are no physical borders to separate people speaking some

    This archive was generated by hypermail 2.1.5 : Tue Aug 23 2005 - 20:41:45 CDT