Re: REALLY *not* Tamil - changing scripts (long)

From: Kenneth Whistler (
Date: Mon Jul 29 2002 - 18:21:03 EDT

> > It's *much* easier -- and, in the long term, safer -- for them to
> > select from the extensive inventory of characters available in Unicode and
> > to avoid using ASCII punctuation characters with redefined word-building
> > semantics.
> I don't get what you are saying here, why should people be limited to
> ASCII punctuation characters?

That isn't what Peter was saying. You are confused here by your misinterpretation
of what he was saying.

The recommendation that Peter was making is that people devising orthographies
for languages should stick to Unicode letters for the letters of their
orthography. (If the script in question is Latin, as most new orthographies
are, then there are *hundreds* of Latin letters to choose from in the standard.)

What orthography developers should avoid is using characters like "7" "@" "!"
"$", "'" and so on as letters of their orthography, since those are certain
to cause all kinds of havoc with word-break and other processes for standard
software -- or even lead to the kind of absurdities as people wanting illegal
constructs like: 'jo', which locales can*not* fix.

Just as choices about rational orthographies used to have to take ease of
use on typewriters as a major factor involved (to fail to do so would be
to condemn legions of people to wretched inefficiency) -- so choices about new
rational orthographies should now being taking ease of use on computers as
a major factor involved. That is just a realistic approach that any *serious*
deviser of an orthography should be taking into account.

> With GNU libc you can declare your own set
> of punctuation characters in the locale, and they can be any 10646
> character.

Peter was talking about the opposite case. But you should examine carefully
what the implications are of your suggestion here. If I were to make the
absurd choice of picking 18 Chinese characters to serve as my punctuation
characters, and then went through the exercise of declaring my own
locale with GNU libc, I would only be guaranteeing that my locale (and all
my text data) would only function correctly in a microscopic environment
that I defined (or could browbeat a few others to share).

The reason for sticking to the Universal Character Set and for sticking
to standardized properties for the characters in that set is to
guarantee widespread interoperability and to guarantee that my text,
in my language, works correctly in all off-the-shelf software -- not
merely in my own hacked-up locale.

Serious orthography designers should not allow themselves to get
stuck in such dead-end traps.


> Or are you referring to the specific locale syntax from
> POSIX/TR 14652?
> Kind regards
> Keld

This archive was generated by hypermail 2.1.2 : Mon Jul 29 2002 - 16:27:09 EDT