RE: [africa] Re: Questions: locales; CLDR process; ISO-639 (again)

From: Donald Z. Osborn (
Date: Wed Mar 01 2006 - 21:13:22 CST

  • Next message: Andreas Prilop: "Re: [almost OT] Music score with RTL lyrics"

    Mamady, Thanks for this letter. Would you be willing to elaborate on the
    technical challenges you and your colleagues see with ISO-639 as it is (and is
    proposed in the case of ISO/DIS-639-3? (I hope it is okay with list members to
    pursue this thread.)

    Also, I need to discuss with you - offline - a possible submission of a locale
    for N'ko. I mention it publically since I have questions re (1) advice on
    submitting locales for the same language but different scripts (coding etc.),
    and (2) tips re an RTL script like N'ko. Ay help (perhaps best offline to
    Mamady and me).


    Don Osborn
    PanAfrican Localisation Project

    Quoting Mamady doumbouya <>:

    > Don;
    > Don;
    > Your questions on locales, CLDR process, and ISO-639 are of great importance
    > to the N'Ko Languages users, and no doubts to the other languages on the
    > continent. For the past few months some of us on the N'Ko Technology
    > Support Group have been reviewing the impact of ISO-639-1 and ISO-639-2
    > particularly on the N'Ko languages, and the other Manden Languages in
    > general. Many of us believe that the current coding structure may present
    > some serious technological challenges to African languages.
    > Many thanks for touching timely on a very important subject.
    > Mamady Doumbouya
    > N'Ko Institute
    > www.nkoinstitute,
    > -----Original Message-----
    > From: [] On
    > Behalf Of Peter Constable
    > Sent: Wednesday, March 01, 2006 1:54 PM
    > To: Donald Z. Osborn; John Cowan
    > Cc:;;;
    > Subject: RE: [africa] Re: Questions: locales; CLDR process; ISO-639 (again)
    >> From: [] On
    > Behalf Of
    >> Donald Z. Osborn
    >> Quoting John Cowan <>:
    > Hmmm... evidently some msgs from this list aren't getting past our spam
    > filters.
    >> Also I note that the locale form needs language code and country code.
    > Not
    >> trying to make arguments here, but to understand how best to use the
    >> system and all the various codes.
    > Keep in mind that a locale is different from a language. Don't confuse the
    > need to use a country ID to reflect a regional dialect or spelling
    > differences ("language" identification) with the need to include a country
    > ID to reflect processing parameters associated with a country such as
    > default currency (locale identification). Language distinctions are always
    > part of a locale, so when a country ID is needed for language distinctions
    > the language ID can look the same as a locale ID. But there's a logical
    > distinction: locale IDs generally include a country ID since locales
    > generally have some country-based data, but not all language tags require a
    > country ID.
    >> > Work on RFC 3066ter, which will incorporate ISO 639-3 tags, has not
    > yet
    >> > formally begun. The intention of most of the various players,
    > however, is
    >> > to use a design in which a language encompassed by a 639-3
    > macrolanguage
    >> > will have a two-part language subtag, of the form zh-yue
    > (Cantonese).
    >> > So 639-3 code elements for languages that are *not* macrolanguages
    > will
    >> > be added directly, but code elements like yue will not: yue will
    > only
    >> > exist in Internet language tags as part of the compound subtag
    > zh-yue.
    >> Thanks for this clarification. Actually the "nesting"of the '3 codes
    >> under a '1 or a '2 code makes a lot of sense. Two questions:
    >> 1) Can one file a locale before 3/15 using this format "ff-ffm-ML"
    > even though
    >> the design is not yet oficial?
    > If you mean file a locale into CLDR, that's a question for the CLDR list,
    > not this list.
    >> Beyond that I see that there may be a lot of discussion on the roles
    >> and use of the different codes in the case of different
    >> (macro)languages. In teh
    > case of
    >> Arabic, for example, would a simple ar-EG be enough or would you need
    > (or
    >> alternatively want to rule out) ar-arz-EG (arz=Egyptian spoken
    > Arabic), while
    >> at the same time allowing perhaps that less widely spoken dialects in
    > the
    >> country be noted?
    > Standard Arabic is used across Arabic-speaking countries and is generally
    > the preferred variety for text. This is what would almost certainly be used
    > in Arabic locale data. Thus, ar-EG is probably the most appropriate for this
    > case. If someone is specifically using a locale for creating and working
    > with content or resources in arz, then ar-arz-EG might be an appropriate
    > locale -- but note, it would be a different locale than ar-EG.
    >> But today, if we were filing two locales for Kpelle, what would be the
    > best
    >> coding? I'm assuming that kpe-LR annd kpe-GN would be the best (or
    > least bad)
    >> choices even if later the xpe and gkp have to be added?
    > Again, a question for the CLDR list.
    >> So another question (sorry these are accumulating) is what kpe-xpe-LR
    > and
    >> kpe-gkp-GN locales would offer to a group localizing for Kpelle "kpe"
    > as a
    >> transborder, multidialect (macro)language?
    > At this point, I think that's a question for the language communities to
    > decide, not us.
    > 4. Going back to ISO-639 in general (I know this subject has been discused
    > before but please bear with me), is there going to be any kind of feedback
    > between the processes of developing locales and localization on one hand and
    > amending the list of ISO-639 codes on the other? I recall there being some
    > mention of a block on new ISO-69-1 and 2 codes, or that a 1 code will not be
    > given where there is a 2 code, but that
    > *maybe* a new 1 and 2 code could be given (Runyakitara might be a candidate
    > for the latter). Also mention of possible additional ISO-639 codes beyond
    > the three ranges already. What is the latest on all this?
    >> >> 4. Going back to ISO-639 in general [...] What is the latest on all
    >> >> this?
    >> >
    >> > I think, but I am not sure, that no new 639-1 codes can be added
    > after
    >> > 639-3 goes into effect. (In principle, a language missed by 639-3
    > could
    >> > be added simultaneously to -1, -2, and -3, but the chance that such
    > a
    >> > language both has been missed and meets the criteria for -1 is
    > small.)
    > The JAC loosely committed not to add something to -1 that was already in -2.
    > (I say "loosely" meaning that they did not rule out the possibility that
    > circumstances might change in the future mandating a need for a new
    > alpha-2 where an alpha-3 already existing in -2.) The JAC has never made a
    > similar commitment wrt -3. But, we were just recently discussing the future
    > of -1, and while this specific concern wrt -3 didn't come up, we were
    > thinking that we should further constrain -1 so that requests to add alpha-2
    > would no longer be accepted from anybody but could only come from an ISO
    > member body. This would really reduce the number of requests we get for
    > alpha-2 IDs.
    >> > Any 639-3 language could be added to 639-2, using the same code
    > element
    >> > for it in both parts of the standard.
    > 639-2 will become a subset of the union of 639-3 and 639-5 (the latter for
    > collections); there will be a single alpha-3 code space. The criteria for
    > inclusion in 639-2 is likely to get further constrained from what it is now.
    > In effect, 639-2 will become a profile of alpha-3 of interest to a
    > particular user community; the TC46 reps to the JAC will be working on a
    > proposal for how we define that user community.
    >> I'm thinking that language change, planning, and engineering would
    > call
    >> for some
    >> flexibility on this...
    > There's no question that the plane of language varieties will change,
    > especially in developing nations as language planning and development
    > activities bring greater standardization and stabilization of languages.
    > This will be one of the challenges we face in language coding, and perhaps
    > also in software implementations. One thing to keep in mind is that
    > something like a software localization has potential to be a significant
    > factor in how the sociolinguistic scenery evolves.
    > Peter Constable

    This archive was generated by hypermail 2.1.5 : Wed Mar 01 2006 - 21:19:58 CST