Re: Making orthographies computer-ready

From: Doug Ewell (
Date: Tue Jul 30 2002 - 01:21:05 EDT

Addison Phillips [wM] <aphillips at webmethods dot com> wrote:

> The first stop in creating a new orthography (or computerizing an
> existing one, perhaps from the days of the typewriter), for my money
> would probably be to get ISO-639 to issue the language a 2-letter code
> so you can have locale (and Unicode character database) data tagged
> with it ;-).

The ISO 639 Maintenance Agency has resolved not to add any new 2-letter
(ISO 639-1) codes for the languages that already have 3-letter (ISO
639-2) codes. This is described in RFC 3066.

<Peter_Constable at sil dot org> responded:

> OK, now you've hit a hot button: The industry needs to wake up to the
> fact that the requirement that a language have an ISO-639 2-letter
> code before a locale can be created is a dead end. There just aren't
> enough 2-letter codes to go around, and ISO 639-2 has restrictive
> requirements for doling out 2-letter codes -- it wasn't created for
> the benefit of locale implementers, but for the benefit of
> terminologists.

And bibliographers. In any case, the real problem is not the ISO 639
"50 documents" restriction. No language coding system -- ISO 639 or
otherwise -- is sufficient to describe locales, because locales consist
of more than just languages. Lumping together all English speakers in
the world, for example, would be just silly. The standard solution is
to append a country code, as though locale were simply a matter of
language+country, which is almost as silly -- it assumes all English
speakers in the U.S. can use the same settings, but German speakers in
Switzerland and Liechtenstein require different settings.

> Luiseņo and Tongva simply are not candidates.

Luiseņo does have a 3-letter code (lui), while Tongva has neither an ISO
639 code nor an Ethnologue code (the on-line Ethnologue has no listing
for either Tongva or Gabriel[ie][nņ]o). So the criteria at least for
these two languages seem to be similar.

> This very issue was raised with the relevant ISO committee in relation
> to Hawaiian: a 2-letter code was requested specifically because
> someone was trying to get a Unix implementation developed and was told
> by the engineers that it couldn't be done without an ISO 2-letter
> Well, I'm pretty sure Hawaiian isn't going to get it, because it
> doesn't meet the requirements for ISO 639-1.

The requirement that it doesn't meet is that it already has a 3-letter
code (haw). BTW, note that RFC 3066, in citing Hawaiian as an example
of this requirement, misstates the code for Hawaiian as "hwi".

> Instead of asking for a 2-letter code, the engineers should have been
> looking at what it would take to make the software support a 3-letter
> code (which already exists in ISO 639-2).

Again, RFC 3066 spells out very clearly how to do this. It should be
followed by all systems and applications that deal with language tags.
It was published in January 2001; unfortunately, far too many people
think RFC 1766 is still current.

I know of one mechanism, at least, that is fully conformant to RFC 3066
and fully supports 3-letter codes. Too bad the UTC is getting ready to
deprecate it. ;-)

Addison replied:

> You are much more likely to get people's attention if you have a
> by-god-two-letter code than if you don't. (Today) you just can't
> ignore the perception that two-letter codes are somehow "legit" and
> three-letter codes somehow aren't... and that too many locale
> structures are based explicitly on the two-letter flavor.

While it's true that many ISO 639-based systems still don't support ISO
639-2 codes, there is no justification for considering them any less
"legit" than ISO 639-1 codes. People who do this are probably also the
ones, when a telephone area code split or overlay occurs, who regard the
new area code as less "prestigious" or "legit" than the old one (which
it is, for about 72 hours) and file lawsuits to prevent the relief plan
from being implemented or to avoid being placed in the new code.

-Doug Ewell
 Fullerton, California

This archive was generated by hypermail 2.1.2 : Mon Jul 29 2002 - 23:36:55 EDT