Re: A basic question on encoding Latin characters

From: Scott Horne (shorne@metaphasetech.com)
Date: Wed Sep 29 1999 - 19:54:32 EDT


Kenneth Whistler wrote:
>
> It would be a good idea to try to separate the market realities of the
> information industry (where Scott's concerns are, unfortunately, well-
> founded and part of a pattern of domination by the industrial,
> developed part of the world) from the technical issues of character
> set encoding and architecture that we are discussion on this list.

We can try, but I doubt we'll succeed. Just a few days ago it was pointed out
here that industry will probably resist the disunification of the florin sign
and the lower-case hooked _f_ because the former has already been mapped to
the latter in various fonts and code pages. Predictably, the needs of speakers
of Ewe and other languages spoken by impoverished black Africans are subordinate
to the convenience of corporations owned by white Westerners.

> Note, by the way, that the list of privileged languages includes Japanese,
> Chinese, Korean, and Arabic -- so is not limited to European and
> American colonialists and hegemonists.

No? Many Japanese people have strong objections to the Han unification.
(So do I, for that matter. I also disapprove of the sloppy use of the
inaccurate and insulting term _ideographs_ for Chinese characters. But
I digress.) Back when that decision was being forced on everyone, the
attitude of its powerful proponents was "These @#$& Japanese won't
listen to reason".

Japanese, Chinese, Korean, and Arabic are only "privileged" because of
their commercial importance to the West. Hindi, with far more speakers
than Japanese, is not similarly privileged because until recent years the
computer industry gleefully (and, I'm sorry to say, correctly) assumed
that most Hindi-speakers who were able to purchase and use computers would
be willing or even happy to do their work in English.

> It is just that some of us happen to believe that the
> particular *majority* script known as Latin

Majority? Yes, if dollars are the unit of measure. But no more than
a few tens of millions of people in China and India are native speakers
of languages that are usually written in the Latin script. That's more than
two billion native speakers of languages written in other scripts.
Add Japan, Korea, Bangladesh, Thailand, Myanmar, Pakistan, and most of
Russia, and we already have a majority of the world's population without
even leaving northern and eastern Asia (OK, and a little piece of Europe).

> is *already* fully
> encoded (in fact, re-encoded over and over: there are 1001 LATIN
> letters now in Unicode 3.0, with more coming in Unicode 4.0!) according
> to its native script principles.

What evidence is there for that claim? Are you willing to bet that there
isn't a single letter in the Latin script that cannot be produced from
letters, combining diacritics, and other devices already found in Unicode?

> If you were working to establish an international web presence for
> the Tamazight language, would you rather work with that ruthless
> bastion of capitalist computer industry hegemonism, the Unicode
> Consortium (which "doesn't give a tinker's damn about your [language]"),

There's no need for sarcasm. I'm not doubting the sincerity of (most,
or at least many) members of the Unicode Consortium. I'm just pointing out,
as Gregg Reynolds did so well yesterday, that the political nature of
Unicode is often ignored, perhaps deliberately, in the course of routine
technical work.

> Or should you
> depend on the tender mercies of the Algerian government to represent
> your character encoding interests to the appropriate ISO committe to
> get precomposed Latin characters for Tamazight encoded so you can
> be a "first-class citizen" in the international standard?

Who's to say that I'd want precomposed Latin letters? Perhaps I'd use
the Tamazight or Arabic script.

No, I wouldn't expect the Algerian government to help with Tamazight.
But nor would I expect enthusiastic coöperation from the computer
industry.

Scott Horne



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT