RE: (long) Making orthographies computer-ready (was *not* Telephoning Tamil)

From: Addison Phillips [wM] (
Date: Mon Jul 29 2002 - 18:59:07 EDT

I know, hence the jocular tone with wink-and-smile. You are much more likely
to get people's attention if you have a by-god-two-letter code than if you
don't. (Today) you just can't ignore the perception that two-letter codes
are somehow "legit" and three-letter codes somehow aren't... and that too
many locale structures are based explicitly on the two-letter flavor.

On the other hand, I suspect that the two-letter dogma is more past-history
than actual technical requirement. For example, there are real Solaris
locales with names like "japanese". Java allows you to ask for/construct a
locale with any pair/trio of strings (said locale doesn't have any meaning,
since you can't populate the data files). And so on. Just because no one
makes locales using 3-letter codes doesn't mean it isn't technically
impossible. (But it doesn't mean that there is no restriction either.)

Of course, I understand why a company might make a business decision not to
make and support a locale for a language that doesn't qualify for a
two-letter code. Lack of compelling business reasons to build, change, or
test support for minority languages is more a limiter here probably than
active engineering work preventing it.


> -----Original Message-----
> From: []On
> Behalf Of
> Sent: Monday, July 29, 2002 3:02 PM
> To:
> Subject: Re: (long) Making orthographies computer-ready (was *not*
> Telephoning Tamil)
> On 07/29/2002 03:56:36 PM "Addison Phillips [wM]" wrote:
> >Nonetheless, if you glance at the "SpecialCasing" file in Unicode, you
> will
> >note that almost without exception the entries are locale driven. The
> first
> >stop in creating a new orthography (or computerizing an existing one,
> perhaps
> >from the days of the typewriter), for my money would probably be to get
> ISO-639
> >to issue the language a 2-letter code so you can have locale (and Unicode
> >character database) data tagged with it ;-).
> OK, now you've hit a hot button: The industry needs to wake up to the fact
> that the requirement that a language have an ISO-639 2-letter
> code before a
> locale can be created is a dead end. There just aren't enough 2-letter
> codes to go around, and ISO 639-2 has restrictive requirements for doling
> out 2-letter codes -- it wasn't created for the benefit of locale
> implementers, but for the benefit of terminologists. Luiseņo and Tongva
> simply are not candidates. This very issue was raised with the
> relevant ISO
> committee in relation to Hawaiian: a 2-letter code was requested
> specifically because someone was trying to get a Unix implementation
> developed and was told by the engineers that it couldn't be done
> without an
> ISO 2-letter code. Well, I'm pretty sure Hawaiian isn't going to get it,
> because it doesn't meet the requirements for ISO 639-1. Instead of asking
> for a 2-letter code, the engineers should have been looking at what it
> would take to make the software support a 3-letter code (which already
> exists in ISO 639-2).
> - Peter
> ------------------------------------------------------------------
> ---------
> Peter Constable
> Non-Roman Script Initiative, SIL International
> 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
> Tel: +1 972 708 7485
> E-mail: <>

This archive was generated by hypermail 2.1.2 : Mon Jul 29 2002 - 16:58:27 EDT