Re: Questions re ISO-639-1,2,3

From: Philippe Verdy (
Date: Mon Aug 22 2005 - 15:15:37 CDT

  • Next message: Michael Everson: "Re: Historical Cyrillic in Unicode"

    From: "Peter Constable" <>
    >> From: Donald Z. Osborn []
    >> 1) It is seen as convenient to have a one-stop site for various
    >> information
    >> relevant to localization. (For my part, when assembling information on
    > a
    >> language-by-language basis for African language localizers, I thought
    > it
    >> useful
    >> to put relevant ISO-639 codes on the various pages
    > I have no problem with citing ISO 639 IDs for particular languages: that
    > is something that we expect to be stable. It's quite another thing,
    > however, if we're talking about a general listing of language
    > identifiers. For the latter, I feel that people should refer to
    > definitive sources: the official source or an approved mirror.

    This is not the case of ISO 3166-1 alpha-2 country codes commonly used when
    designing locale codes. On the opposite, the UNSD numeric code, and the ISO
    3166-2 alpha-3 codes are much more stable.

    Isn't it time to start deprecating the use of alpha-2 codes for countries
    and territories? I see that several major sites now use the alpha-3 codes
    because they are stable and don't require updating the databases (think
    about the confusion that may have already happened with CS...)

    One should also remember that the alpha-3 codes and even the UNSD numeric
    codes are more complete than the current list of countries assigned with
    alpha-2 codes in ISO-3166.

    This unstability of alpha-2 codes already creates a problem with Internet
    domain names (remember when .su domains had to be renamed!). This means that
    in the future, Internet domain names will most probably not change even if
    countries are renamed and their ISO 3166-1 alpha-2 codes change. So the set
    of ccTLDs will become more and more desynchronized with ISO 3166-1 alpha-2
    codes (and the current policy defined for Internet root domain names will
    become unsustainable; it is already under pressure since the introduction or
    .eu which was really needed, independantly of the fact that EU requested a
    ISO 3166-1 alpha-2 code)

    ISO 3166-1 is also itself under pressure by various organizations like the
    IPO for intellectual properties, and the International Postal Organization,
    and there remains too little room in ISO 3166-1 for new countries or
    territories to get a significant alpha-2 code (Think about which code one
    would use if ever Corsica became independant and then legitimately asked for
    an alpha-2 code? What would happen if some large federation, Russia, Mexico,
    Brasil and even USA, China or Canada, changed its constitution so that all
    their state would become not only autonomous and then independant but
    members of a large international organization, similar to the EU? Thnik
    about what will happen is the EU finally becomes a Federal State? Wouldn't
    all Internet sites and lots of documents containing URLs need to change
    their domain names? What about printed documents and the conservation of
    data and reusability of long-term statistics?).

    One other way of improving the system would be to start thinking about
    deprecating the non-working ISO 3166-2 for country subdivisions (needed also
    for usage in locales). There's already a nearly perfect numeric system used
    by Eurostat to designate the subdivisions of countries and which works more
    reliably than ISO 3166-2 for all countries in the European union (and
    candidate countries as well).

    It is innovative even if it is very different from the system used
    nationaly: for example the French departments are given 3-digits codes by
    Eurostat (unlike in the French INSEE system that uses 2 digits, or 3 digits
    for overseas, or 1 digit and a letter in Corsica), the French regions are
    given 2-digits code that unambiguously group departments in regions, and
    even these regions are grouped in macro-regions or areas with 1-digit code.

    For now Eurostat still uses ISO3166-1 to designate countries, so local codes
    are appended after the 2-letter country code (the alternative would be to
    use the 3-digits UNSD codes which also allows for grouping countries by
    coherent regions or continents, a system that is not so limited all but the
    first position may even be assigned letters instead of digits if needed).

    Wouldn't such system also work for larger countries such as the big
    federations (like USA, Russia, Australia, and China)? If so, how many digits
    and/or letters would be necessary?

    So is there a project to create a "Unicode" of geographic areas, and of
    locale codes, with a stability policy as strong as the permanent assignment
    of characters to code points?

    This archive was generated by hypermail 2.1.5 : Mon Aug 22 2005 - 15:16:40 CDT