RE: (long) Making orthographies computer-ready (was *not* Telephoning Tamil)

From: Addison Phillips [wM] (
Date: Mon Jul 29 2002 - 20:51:09 EDT

> > One that occurs to me might be the Khoisan languages of Africa,
> > which I believe commonly use "!" (U+0021) for a click sound.
> > This is almost exactly the same problem you are describing for Tongva.
> U+01C3 LATIN LETTER RETROFLEX CLICK (General Category Lo) was
> encoded precisely for this. It is to be *distinguished* from
> U+0021 '!' EXCLAMATION MARK to avoid all of the processing problems
> which would attend having a punctuation mark as part of your letter
> orthography. A Khoisan orthography keyboard should distinguish the
> two characters (if, indeed, it makes any use at all of the exclamation
> mark per se), so that users can tell them apart and enter them
> correctly.
Amazing! It is there (and has been "forever", since it has a Unicode 1.0
name) and doesn't even normalize to ol' U+0021. Nonetheless, I suspect that
the exclamation mark's origin was in the use of ASCII for the otherwise
unrepresented sounds and that the "should" in your note remains at least
somewhat unrealized. A brief Googling of Khoisan produces pages that use !,
#, //, and ' for the clicks encoded by U+01c0->U+01c3, including the Rosetta
Project page which is encoded as UTF-8 (!!), but uses the ASCII characters,
not the specially encoded variants cited.

Of course, none of the sites I searched was actually IN one of these
languages. Every one that I saw was in English (one had a link to an
Afrikaans page). Perhaps the various Khoisan peoples who have web pages are
using the Unicode characters in question. But the likely prevalence of
English (or at least Western European) keyboards and systems probably has
encouraged the widespread non-adoption of the correct characters (hence,
this may be the example that proves the rule, although I can't think of
anything else that looks more like a click than a bang or an octothorpe ;-).

