REALLY *not* Tamil - changing scripts (long)

From: Addison Phillips [wM] (
Date: Fri Jul 26 2002 - 23:46:16 EDT

I dunno, Curtis. This sounds less like a job for Unicode and more like a job for other mechanisms, such as user-defined locales.

Granted that keyboarding is a pain if you choose a character collection that is not represented by a convenient keyboard. But the real issues appear to be mostly in linguistically related processing (like word breaking, sentence breaking, collation, and the like). In most cases these are not something that Unicode per-se can help with, but which user-defined locale data could.

Let's take the putative Tongva @ letter as an example. If I had to create a locale in, say, Java for it, I could create special casing information (if @ has case), a collation table, breaking tables, and the like and nail most of the issues that you have. Even loading a "spell checker" is really a locale- or language-related problem in most systems today. The main problem would be if you were using @ but actually MEANT û or some such. E.g. the Klingon problem, but with a real language.

When that's the case, then you have a case for encoding a new character. But the escaping mechanisms in Unicode, like SpecialCasing, seem ample enough to handle minority languages like these in all cases where you are just creating an orthography using an existing writing system's bits and pieces. It's not like Unicode has defined, say, "vowelness" or pronunciation.

IOW> If you have a new character that needs encoding, then the UTC can probably be cadged into encoding it. If you are using existing encoded characters from another writing system, then there is nothing to do >>in Unicode<< except note the exceptional use of those characters.

That does leave you with the must less happy problem of finding a platform with user defined locales (approximately no platforms conveniently do this).

Obviously I'm not an expert in these linguistic areas (and hence rarely comment on them), but it seems to me that the lack of other mechanisms makes Unicode an attractive target for criticism in this area.

Best Regards,


Addison P. Phillips
Director, Globalization Architecture
webMethods, Inc.
432 Lakeside Drive
Sunnyvale, California, USA
+1 408.962.5487 (phone)
+1 408.210.3569 (mobile)
Internationalization is an architecture.
It is not a feature.

> -----Original Message-----
> From: []On
> Behalf Of Curtis Clark
> Sent: Friday, July 26, 2002 6:46 PM
> To:
> Subject: *not* Tamil - changing scripts (long)
> James Kass wrote:
> > Isn't this kind of a Catch-22 for anyone contemplating script reform?
> > Do we discourage people from altering their own scripts? Should we?
> > It is suggested that scripts can be "alive" in the same sense that
> > languages are "alive"; changes (which are part of life) just occur
> > much more slowly in scripts.
> This touches on some "Unicode vs. the world" issues I've been thinking
> about, having to do with indigenous peoples developing orthographies for
> their own languages.
> My two examples are both languages of the Takic group in southern
> California. The Luiseño language declined to a very few native speakers,
> but has enjoyed a renaissance in recent years. The Gabrieleno (Tongva)
> language was effectively extinct—no native speakers, no recordings, some
> amount of written documentation—but the Tongva are resurrecting it (it
> is similar enough to the other Takic languages that it is possible to
> reconstruct parts that are missing).
> Anthropological accounts of both languages are of course in the phonetic
> alphabets beloved by linguists in the days before IPA stabilization.
> And, like many other native Americans, the Luiseño and Tongva have
> wanted simpler orthographies that can be typed with US-English keyboards.
> I don't have a lot of familiarity with Luiseño, but web pages have
> included passages where non-letters (such as @) are used as letters.
> This solves the keyboarding problem (since few people would try to
> pronounce an email address as Luiseño), but I imagine all sorts of
> issues with sorthing, searching, word selection, casing, and all the
> other sorts of things that computers can do for "major" languages.
> Where all this involves me is with Tongva. I have been working with a
> Tongva ethnobotanist on a project that, among other things, involves
> plant labels in Tongva, English, and Latin. Tongva spelling is currently
> inconsistent, and my colleague has been regularizing it for this project
> (because he is the primary language teacher for the nation, and few have
> any fluency at all, he has this freedom). Somewhat like English, Tongva
> represents both the "oo" and "uh" sounds both by "u". Unlike English,
> the rest of the orthography provides no clues to which sound is meant.
> /If/ my colleague were to ask (and the Tongva may be satisfied with the
> existing orthography), I would suggest representing the "uh" sound with
> a Latin-1 letter (possibly û), and explain several simple alternatives
> for keyboarding it on Mac and Windows. I would *not* suggest overloading
> @, or some similar approach.
> I suppose that Unicode could add at some point "Luiseño letter @", with
> appropriate properties, but that would circumvent the reason for picking
> it: its presence in US-ASCII. In an ideal world, indigenous peoples
> would hook up with folks like Michael Everson (or even me) and get some
> guidance on how to have their orthography and eat it, too, but as things
> now stand, overloading, font hacks, and the like are the path of least
> resistance.
> --
> Curtis Clark
> Mockingbird Font Works

This archive was generated by hypermail 2.1.2 : Fri Jul 26 2002 - 21:50:00 EDT