Corporate influence on Unicode development (long)

From: Doug Ewell (dewell@adelphia.net)
Date: Sun Jul 21 2002 - 20:30:33 EDT


I don't know if William Overington is still a subscriber to this mailing
list -- he may have gone away to find (or form) a new group more
sympathetic to his "novel" applications of Unicode -- but one of the
issues he raised about two weeks ago, right about the time the
chromatic-code and precomposed-ligature debates were coming to a head,
was an insinuation that Unicode is unduly influenced by large corporate
interests.

William based this claim, at least in part, on the $12,000 fee required
for "full" membership in the Unicode Consortium, a membership level
described on the Unicode Web site as being appropriate for "your company
or organization" rather than for individuals.

I promised (or maybe threatened) to discuss this issue, from the
standpoint of an individual who is interested in Unicode but has yet to
join the Consortium due to financial considerations.

First of all, the figure that William (or any other individual) really
should be looking at is not $12,000 for a full membership, but $600 for
a "specialist" membership or $120 for an "individual" membership. (BTW,
I would be interested in hearing -- perhaps off-line -- from individuals
who hold or have held such memberships, to find out how they felt their
memberships benefited them and Unicode.)

Second, many good ideas have come from this list, and we have to assume
that UTC listens to some of them and can be influenced by some of them.
It wouldn't be smart to ignore truly good ideas just because they come
from a free mailing list.

Some list members have already pointed out that the character repertoire
of Unicode/10646 can hardly be said to be reflective of the interests of
big business. It is hard to imagine how big business would have
benefited from "pushing through" scripts like Tagbanwa or Old Italic, or
non-script blocks like Byzantine Musical Symbols. The American
Mathematical Society, largely responsible for the big chunk of math
symbols added to Unicode 3.1, doesn't seem like a stereotypical "large
corporate interest" either.

Indeed, if big business interests were at the heart of the Unicode
character repertoire, we would probably be seeing a lot more of the
precomposed ligatures that William favored so strongly. They would have
given Microsoft and Apple a cheap, easy way to claim "support" for
ligatures without the additional pain and complexity of performing
ligation in a more general, productive way.

And in fact, I had originally planned to write this post to debunk the
entire notion that corporate interest plays any part at all in the
development of Unicode.

But there's more to Unicode than its character repertoire; as Ken and
others remind us, character properties and technical reports and usage
guidelines are what separate Unicode from 10646. And it is here that
some corporate influences do appear to seep in, and where the Consortium
and UTC may want to be careful to avoid either the appearance or the
reality of inextricable corporate tie-ins to Unicode.

The precomposed-ligature debate brought forth several responses phrased
in terms of, "No, you don't need a precomposed ligature at U+E7xx, or
even a ZWJ hint, because Technology Such-and-So will automatically
handle it." Technology Such-and-So could be an application like
InDesign or FrameMaker, or it could be a font architecture like OpenType
or AAT; in the latter case there were frequent discussions of GSUB and
GPOS entries and <rlig> tables, as though those were part and parcel of
Unicode. In either case, one could reasonably infer that a particular
vendor's product or a particular technology is necessary to implement
some aspect of Unicode properly, which isn't -- or shouldn't be -- the
case.

Just today (Sunday), Mark Davis responded to a question about the
Unicode Collation Algorithm in part by pointing out how ICU ("a
particular implementation of the UCA") solves the problem. The solution
was followed shortly with links to ICU-related sites. Now, even though
ICU is an open-source library and thus not a money-making product of
IBM, and even though ICU may be easy to use and may greatly facilitate
the use of UCA, it's still important to realize that neither UCA nor any
other aspect of Unicode *requires* ICU. I could roll my own UCA
implementation if I wanted to, and assuming it was correct and followed
the Unicode Standard and UTS #10, it would be just as legitimate and
just as "Unicode" as if I used ICU or any other library or tool.

The Unicode FTP site includes sample implementations for algorithms such
as UTF-8, SCSU, UCA, and the Bidi algorithm. (UTF-7 was once on this
list as well; thankfully, nobody talks about UTF-7 much any more). At
some point, the Binary-Ordered Compression for Unicode ("BOCU")
algorithm -- implemented in ICU and already mentioned in the SCSU
Technical Standard, despite having no official status in Unicode -- may
be added to this list as well. It would be highly desirable for Unicode
to continue to provide reference implementations rather than directing
users to proprietary implementations on other companies' Web sites, to
avoid the perception that these Unicode algorithms require the use of
corporate products.

Some algorithms described in Unicode Technical Reports, such as
UTF-EBCDIC and CESU-8, were quite obviously promoted by corporations
that would stand to benefit from their adoption. In each case, however,
the algorithms (however simple) are completely specified in the TR,
without any requirement to rely on (e.g.) IBM's or Oracle's products to
get the job done. I've implemented both of these on my own (holding my
nose in the case of CESU-8), partly to uphold my belief that it should
be possible to implement *anything* in Unicode without relying on any
vendor's product.

In fact, I would have sworn there was something in the Unicode Standard
itself to the effect that Unicode "shall not require the use of any
particular vendor's" products or tools. But I looked through the 3.0
book and can't find any such claim. Was this removed sometime in the
last 10 years, or am I just imagining it?

Corporate references certainly aren't a bad thing in and of themselves.
We need to know, and perhaps more importantly *others* need to know, how
successful Unicode is in terms of its adoption in important software
systems. The more people know about the level of Unicode support in
operating systems and applications by Microsoft, Apple, Oracle, etc.,
the more positively that will reflect on both Unicode *and* the
products. But it's also important to know that you, or I, or Fred's
Software Solutions and Sports Bar can implement the Unicode Standard
just as well as the big boys, without undue dependence on the big boys.
That has to be the perception as well as the fact.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Sun Jul 21 2002 - 18:42:23 EDT