The politics of Unicode (was: an endlessly coruscating thread on a basic question)

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Sep 29 1999 - 21:26:22 EDT


Scott,

My, my, the list is feisty today. Well, since you want to pursue it...

>
> Kenneth Whistler wrote:
> >
> > It would be a good idea to try to separate the market realities of the
> > information industry (where Scott's concerns are, unfortunately, well-
> > founded and part of a pattern of domination by the industrial,
> > developed part of the world) from the technical issues of character
> > set encoding and architecture that we are discussion on this list.
>
> We can try, but I doubt we'll succeed. Just a few days ago it was pointed out
> here that industry will probably resist the disunification of the florin sign
> and the lower-case hooked _f_ because the former has already been mapped to
> the latter in various fonts and code pages. Predictably, the needs of speakers
> of Ewe and other languages spoken by impoverished black Africans are subordinate
> to the convenience of corporations owned by white Westerners.

Michael Everson gave pretty much the entire list of problematical
unifications. (There are a few others that are notable problems for
rendering, such as baseline ellipsis versus midline ellipsis.) All
of these are under debate, and there are no foregone conclusions for
any of them.

Any disunification of an existing encoded character has a cost for implementations,
but if I had to bet, I think this one will eventually be disunified, since it
has other hidden costs in terms of inconsistent property treatment between
the currency symbol and the alphabetic letter.

But painting the standards committees as a bunch of black hats and shills
for white Western corporate interests ignoring the needs of Ewe speakers
is pretty disingenuous in this case. It was the Unicode Technical Committee
and WG2 that brought the entire Pan-African alphabet repertoire onto the
table in the first place and got it encoded in the International Standard.
U+0192 f with hook was a mistaken unification, but that is an explicit
small problem to be addressed through the standards process, rather than
some symptom that the whole standard is rigged to ignore the interests of
Black Africa because it is irrelevant to Western corporate power.

>
> > Note, by the way, that the list of privileged languages includes Japanese,
> > Chinese, Korean, and Arabic -- so is not limited to European and
> > American colonialists and hegemonists.
>
> No? Many Japanese people have strong objections to the Han unification.

And many do not, as has already been pointed out on this list.

> (So do I, for that matter. I also disapprove of the sloppy use of the
> inaccurate and insulting term _ideographs_ for Chinese characters. But
> I digress.)

Properly used, the term "ideograph" is neither inaccurate or insulting.
It is a widely understood compromise term for the lesser known
term "logograph", which I presume you prefer. Only if you insist on
interpreting "ideograph" in its strictest etymological sense and
start to pick apart all of the hanzi according to their various
and sundry graphological histories is "ideograph" inapplicable to most
hanzi. But then you would never make the mistake of misattributing
narrow interpretive usages of a word to those who are making generic
usage of it, now would you?

> Back when that decision was being forced on everyone, the
> attitude of its powerful proponents was "These @#$& Japanese won't
> listen to reason".

As John Jenkins has pointed out, the bulk of the unification decisions
were made by the IRG. They weren't forced on everyone by Western
corporations. The most important single player in this process
is the Chinese national standards body, but all of the Asian
national bodies have had continuous input.

>
> Japanese, Chinese, Korean, and Arabic are only "privileged" because of
> their commercial importance to the West. Hindi, with far more speakers
> than Japanese, is not similarly privileged because until recent years the
> computer industry gleefully (and, I'm sorry to say, correctly) assumed
> that most Hindi-speakers who were able to purchase and use computers would
> be willing or even happy to do their work in English.

You seem so enamored of the grand concept of the Western domination
of everybody that you miss the point about the local hegemonism
that infects the local information industries, which are not
inconsiderable and continue to grow.

>
> > It is just that some of us happen to believe that the
> > particular *majority* script known as Latin
>
> Majority? Yes, if dollars are the unit of measure.

I was referring to Latin as a majority script, as opposed to the
minority scripts under consideration now like Cham, Dai, and Hmong.
If you would like me to be more explicit, I also consider Han, Cyrillic,
Devanagari, and Arabic to be *majority* scripts -- which renders moot
your subsequent calculations.

> But no more than
> a few tens of millions of people in China and India are native speakers
> of languages that are usually written in the Latin script. That's more than
> two billion native speakers of languages written in other scripts.
> Add Japan, Korea, Bangladesh, Thailand, Myanmar, Pakistan, and most of
> Russia, and we already have a majority of the world's population without
> even leaving northern and eastern Asia (OK, and a little piece of Europe).
>
> > is *already* fully
> > encoded (in fact, re-encoded over and over: there are 1001 LATIN
> > letters now in Unicode 3.0, with more coming in Unicode 4.0!) according
> > to its native script principles.
>
> What evidence is there for that claim? Are you willing to bet that there
> isn't a single letter in the Latin script that cannot be produced from
> letters, combining diacritics, and other devices already found in Unicode?

I'm willing to state that the vast proportion of everything in Latin
that needs to be represented is already representable.

There are known specific holes. Michael Everson and the Finnish and
Swedish NB's are working on an explicit Finno-Ugric Phonetic Alphabet
proposal; we are aware that FUPA is not yet fully covered. But that is
the kind of place you have to go to find holes in the Latin coverage --
not national or minority language orthographies.

>
> > If you were working to establish an international web presence for
> > the Tamazight language, would you rather work with that ruthless
> > bastion of capitalist computer industry hegemonism, the Unicode
> > Consortium (which "doesn't give a tinker's damn about your [language]"),
>
> There's no need for sarcasm.

Hm. It seems to be the only thing that communicates.

> I'm not doubting the sincerity of (most,
> or at least many) members of the Unicode Consortium. I'm just pointing out,
> as Gregg Reynolds did so well yesterday, that the political nature of
> Unicode is often ignored, perhaps deliberately, in the course of routine
> technical work.

There is *always* a political side to any activity, including, or perhaps
notably applicable to technological development. I don't deny that. And
development of a universal character encoding is no exception. It has
quite apparently been riddled with language politics from the beginning.
And there are many other implications as well. So here I agree with you.

It is just that the technical discussions do not automatically need to
be steeped in the politics. They can and will continue on their own
technical merits, and it is not clear that this list is to best place
to argue and discuss the political side of this technology.

>
> > Or should you
> > depend on the tender mercies of the Algerian government to represent
> > your character encoding interests to the appropriate ISO committe to
> > get precomposed Latin characters for Tamazight encoded so you can
> > be a "first-class citizen" in the international standard?
>
> Who's to say that I'd want precomposed Latin letters? Perhaps I'd use
> the Tamazight or Arabic script.

That's the *Tifinagh* script. And no you wouldn't. The actual Tamazight
language promotion efforts on the web are working to standardize the
Latin form. Look them up. The Tifinagh script has a certain iconic
importance to Berbers, and does appear, but is mostly of scholastic
interest, and is not being pushed strongly by those building Tamazight
websites.

>
> No, I wouldn't expect the Algerian government to help with Tamazight.
> But nor would I expect enthusiastic coöperation from the computer
> industry.

But you might be surprised by the response from the Unicode list, then.

--Ken

>
> Scott Horne



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT