The Grand Compromise [was: Cultural bias]

From: Kenneth Whistler (
Date: Mon Jan 13 1997 - 15:38:42 EST

The recent discussion has highlighted the fact that
users of Hindi, Arabic, and Irish *all* have their
bones to pick with ISO 10464/Unicode. And frankly,
that is as it should be.

Unicode was never intended to be the perfect, most
polished encoding solution for any single language
or script. It is the grand compromise encoding to
address the problem of encoding *all* of the world's
scripts in a single, universal character set.

The advantages of a universal encoding should be
apparent to all. But, as for any compromise, it can
always be attacked by purists concerned only with
a single aspect of the problem. Such criticisms will
have more weight if people will take time to consider
two things before wading in about how lousy Unicode
is for this or that script:

1. REALITY. ISO 10646 is an accepted international
standard. It isn't going to go away, and it is being
implemented in many products. Proposals for improvements
and additions are generally in order; complaints about
the principles followed in already-encoded scripts are
basically moot.

2. LEGACY. Many parts of ISO 10646 got to be the way
they are because of the legacy of existing standards,
of existing implementations, and because of the complex
history of rendering various writing systems of the world.
Arguments which proceed philosophically based on first
principles for a particular writing system are often
rejected out of hand by those working on the standards
if they show no sensitivity to the legacy issues which
drive much of the actual work on standards.

And regarding Sakamura-san's comment, quoted by Arun Gupta:

> These compromises, moreover, were made not from a neutral standpoint
> but with the linguistic biases of people in the Latin language sphere
> (especially the English language sphere)."

The details of such claims of cutural bias aside, consider
the following. By far the single most significant impediment
to the implementation of Unicode is the move from an 8-bit
character to a 16-bit character. The 8-bit character was
basically working o.k. for English-speaking countries and Western Europe,
but was not working well for other character sets and for multilingual
text. But moving to a 16-bit character engenders an unbelievable
amount of pushback and objections: from English-speaking
engineers, engineering managers, and product managers who do
not see the larger context of the problem of internationalization
of software. From their point of view, the non-Western, multilingual
bias of Unicode just gets in the way of quick, cheap rollouts of
software for the European and Japanese markets.

As far as I am concerned, it is a *good* thing that Devanagari and
Tibetan are in the international standard, and that we are currently
working through the issues of encoding Mongolian and Yi.

--Ken Whistler

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT