Re: Questions: locales; CLDR process; ISO-639 (again)

From: Donald Z. Osborn (dzo@bisharat.net)
Date: Wed Mar 01 2006 - 01:06:17 CST

Next message: Otto Stolz: "[almost OT] Music score with RTL lyrics"

Next in thread: Mark Davis: "Re: Questions: locales; CLDR process; ISO-639 (again)"
Maybe reply: Mark Davis: "Re: Questions: locales; CLDR process; ISO-639 (again)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

John, Thanks for this reply. I respond in text below...

Quoting John Cowan <cowan@ccil.org>:

> Disclaimer: I speak only for myself, not for ISO, IETF, Unicode, or
> any of their components.
>
> Donald Z. Osborn scripsit:
>
>> 2a. [...] my thought is tht in cases where the ISO-639-1 or 2
>> coded language has variants in ISO/DIS-639-3 defined more or less by
>> country,
>> it makes sense to use the 1 or 2 code plus the country code rather
>> than the 3
>> code.
>
> If Ethnologue divides them into separate languages, that means they are
> more than just national variants, even if they happen to be separated
> by a national border. In any case, it's a good bet that at least
> some populations of speakers are on the "wrong" side of the border,
> and will want to use their own variety of the language, but with the
> cultural conventions (time zone, currency, whatever) of the country
> where they reside. So I would recommend *against* using country codes
> to discriminate between languages.

We run into the issue of "what is a language?" (which we don't need to debate
here other than noting that there are differences of opinion among experts),
and more importantly what are the practical levels of distinction among
different tongues (call them closely related languages [in a
macrolanguage or a
cluster], dialects or whatever) necessary for localization.

One of the reasons *for* using country codes at some level is that for
a number
of the many crossborder languages (or macrolanguages) the orthographies
are set
by national authorities, and some vocabulary may differ based on colonial
heritage (in Africa, borrowings from English or French, for instance). The
latter may be accounted for by the language categories of Ethnologue (and
ISO/DIS-639-3) but the former, in an environment where text is the main
content, seems unavoidable.

Also I note that the locale form needs language code and country code. Not
trying to make arguments here, but to understand how best to use the
system and
all the various codes.

(BTW, your turn of phrase "speakers are on the 'wrong' side of the border,"
which I realize is just a turn of phrase, reminds me of one aspect of
Ethologue's presentation that I am not fond of - in every case they seem
obliged to say "x [language], a language of y [country]" when in so
many cases,
especially in Africa, it's unnecessary and misleading to try to put a language
into such a box. But this is tangential to the issue here.)

> Work on RFC 3066ter, which will incorporate ISO 639-3 tags, has not yet
> formally begun. The intention of most of the various players, however, is
> to use a design in which a language encompassed by a 639-3 macrolanguage
> will have a two-part language subtag, of the form zh-yue (Cantonese).
> So 639-3 code elements for languages that are *not* macrolanguages will
> be added directly, but code elements like yue will not: yue will only
> exist in Internet language tags as part of the compound subtag zh-yue.

Thanks for this clarification. Actually the "nesting"of the '3 codes
under a '1
or a '2 code makes a lot of sense. Two questions:
1) Can one file a locale before 3/15 using this format "ff-ffm-ML" even though
the design is not yet oficial?
2) If not, would this imply that it is better to make a locale for a
variant of
a "macrolanguage" using a '1 code orif not available, a '2? So: ff-ML and not
ffm-ML? (leaving the refinements with the '3 codes until later?

Beyond that I see that there may be a lot of discussion on the roles
and use of
the different codes in the case of different (macro)languages. In teh case of
Arabic, for example, would a simple ar-EG be enough or would you need (or
alternatively want to rule out) ar-arz-EG (arz=Egyptian spoken Arabic), while
at the same time allowing perhaps that less widely spoken dialects in the
country be noted?

>> 2b. An example is Kpelle spoken in the Liberia-Guinea border area
>> (it is also
>> known as Guerze in Guinea). There is an ISO-639-2 code, "kpe," and separate
>> ISO/DIS-639-3 codes for Kpelle of Liberia, "xpe," and Kpelle or Guerze of
>> Guinea, "gkp."My thought is that "kpe-LR" & "kpe-GN" are preferable to "xpe"
>> and "gkp" for locales.
>
> The RFC 3066ter language tags will be (unless something changes radically)
> kpe-xpe and kpe-gkp. The effect of this is that documents tagged with
> either code will match an attempt to find "kpe" documents.

Yes, this makes sense, and so by extension at least many other
"macrolanguages"
(ff for Fulfulde/Pulaar, Man for Manding - at least the western tongues, ...).

But today, if we were filing two locales for Kpelle, what would be the best
coding? I'm assuming that kpe-LR annd kpe-GN would be the best (or least bad)
choices even if later the xpe and gkp have to be added?

>> 2c. Part of this gets back to the definition of what is a language, but for
>> purposes of software localization it may be simpler to go for the
>> higher level
>> of aggregation and distinguish by country (which it seem one has to
>> do anyway).
>> Even this may not be satisfactory in all cases as there are often
>> significant
>> dialect (or language) differences in a language (or "macrolanguage" in SIL's
>> system) within a country.
>
> For that case, RFC 3066bis (which is partly in effect now, though not
> entirely)
> provides machinery for adding subnational or non-national variety subtags:
> en-gb-scouse, for example, is the Scouse (Merseyside) dialect of U.K.
> English.

So, we could use kpe-xpe and kpe-gkp or are kpe-xpe-LR and kpe-gkp-GN, however
redundant, better?

I need to backtrack here before moving on. When I think of an OpenOffice suite
localized in Kpelle for example - even though I don't speak a word of it and
know of no current effort to write a locale - I would thing that kpe by itself
would suffice. Granted there are differences but in general I think that there
will always be an effort to write the software for the highest level of
aggregation, crossing borders and dialect (or language-within-macrolanguage)
differences. What's true for FOSS is also true for MS (noting that for example
an Inuktitut localization of Windows was conceived of for all variants).

So another question (sorry these are accumulating) is what kpe-xpe-LR and
kpe-gkp-GN locales would offer to a group localizing for Kpelle "kpe" as a
transborder, multidialect (macro)language?

>> 4. Going back to ISO-639 in general [...]
>> What is the
>> latest on all this?
>
> I think, but I am not sure, that no new 639-1 codes can be added after
> 639-3 goes into effect. (In principle, a language missed by 639-3 could
> be added simultaneously to -1, -2, and -3, but the chance that such a
> language both has been missed and meets the criteria for -1 is small.)
> Any 639-3 language could be added to 639-2, using the same code element
> for it in both parts of the standard.

I'm thinking that language change, planning, and engineering would call
for some
flexibility on this. Dialect levelling, adoption of standard versions for
literacy and instruction in schools, grouping of closely related
tongues (as in
the case of Runyakitara, which is designed for teaching but is not [yet?] a
macrolanguage listing), and indeed localization efforts, all mean a shifting
terrain.

Add to that the facts that there are "clusters" of languages that are closely
related but not identified as part of a larger grouping (macrolanguage) and
that at least one agency, CASAS, is researching the bases for
standardization /
harmonization of some of these, and it would seem that the overall language
situation is dynamic.

I apologize for being so wordy, but there seem to be a lot of issues involved.
Of the many questions, the urgent ones are those that would help in the
writing
of locales for a number of African languages in the next couple of weeks (!)

Thanks again.

Don

Next message: Otto Stolz: "[almost OT] Music score with RTL lyrics"
Next in thread: Mark Davis: "Re: Questions: locales; CLDR process; ISO-639 (again)"
Maybe reply: Mark Davis: "Re: Questions: locales; CLDR process; ISO-639 (again)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Mar 01 2006 - 01:09:31 CST