Re: Internet, HTML, Java & Unicode

From: Glen C. Perkins (
Date: Thu May 09 1996 - 15:41:50 EDT

David Goldsmith <> opines:

>On another topic, I think part of the reason it has taken the Internet
>community so long to consider using Unicode is because the Consortium has
>not been agressive about pushing Unicode on the Internet. It's all been
>done as volunteer efforts by disconnected people. Progress is starting to
>occur, but it's taken a while, and people are still leery about
>supporting Unicode in Internet products.

Absolutely! "Leery" is exactly right. I can't count the number of times
I've spoken to a programmer in some major company working on an
internet-focused project (web browser, Java implementation, etc.) who, when
questioned about unicode support, answered that unicode support would not
be included for a while because nobody in the development group even knew
where to begin with "all that arcane unicode stuff."

The developers of mainstream internet products are mostly C++ programmers
with datacom experience who are almost uniformly under the assumption that
if you don't have a Ph.D. in linguistics, you'll just screw things up if
you try to implement unicode, and "since nobody else is doing it yet" they
figure (with obvious relief) that they'll be much better off if they don't
do it, either.

I can't help but think that the "marketing" of unicode would benefit
greatly by having some "unicode support toolkits" for all major platforms
placed in the public domain and accompanied by some good "how to implement
unicode" books with that code on CD-ROMs in the back of the book.

>The adoption of Unicode by Java
>has helped more than any other recent event (never mind that the current
>Java distribution truncates it all to 8859-1...).

Again, absolutely right! (Not that I'm surprised, of course, David. ;-) )
One of the hottest newsgroups for developers right now is,
and the questions regarding how to use unicode are coming up constantly
from people who have been programming for years and never considered using
unicode until Java came along. Java is the best marketing opportunity ever
for unicode (although unicode-HTML would give it a run for its money if it
were to materialize.)

By the way, while it is true that the current Java *implementation* from
Sun can only handle 8859-1, Symantec's Cafe implementation has no such
limitation according to discussions I had with a couple of the developers
at Symantec a couple of weeks ago. (I have yet to verify this for myself,
though, so I'm going to take it with a grain of salt, but they are shipping
a Japanese version and claim that the US version will handle Japanese, too,
if your system supports Japanese, because "they are all just using

>The Internet standards
>process is driven by implementations; with the dearth of Internet
>software that uses Unicode, the community has been reluctant to accept it.

Yes. Again, I think this is where the unicode consortium could do a lot of
good by priming the pump with resources that made unicode implementation
significantly less intimidating.

>The fact that Unicode is starting to make headway in the Internet
>standards process is fine, but what will really drive it forward is to
>have mainstream mailers, browsers, and servers that support Unicode and
>take advantage of it.

Add to that public domain unicode-support class libraries, unicode-enabled
versions of popular text editors, and a handful of public domain unicode
fonts. On the net, "useful free stuff" tends to drive standards. ;-)

In the Java and HTML newsgroups, developers frequently dismiss unicode with
sarcastic remarks such as "oh, I seem to have misplaced my unicode text
editor," and "yeah, we don't need a firewall to protect our proprietary
information--we'll just do it in unicode!" With little system support for
unicode, application developers don't create applications that use it. With
few applications that can use it, why add unicode support to the system?

The chicken-and-egg problem remains, but I think the internet, with its
cross-platform emphasis (a Japanese PC is a platform, right?), increases
the pressure to solve it, if it could just get started.

Imagine if the unicode consortium could create (or see to the creation of)
"unicode kits" for most major platforms, including the fonts, and put them
up on various servers for downloading and also make them available on cheap
CD-ROMs. Other products such as web browsers, text editors, Java
implementations etc. could then say something to the effect of "full
support for unicode (requires The Unicode Consortium's Unicode Kit(tm) or
equivalent)". Imagine web pages that say (in Latin-1) "This page uses
Unicode! Click HERE if you don't already have a unicode kit."

The idea would be to have some standard bundle of system APIs and fonts
that application developers could rely on, making unicode support much
easier and less intimidating to add to their products. If the unicode kit
for the Mac were made with Apple, the one for Windows with Microsoft, etc.
then these OS makers would probably gladly incorporate the unicode kits
into future versions of their OS's. If an OS maker doesn't add enough
unicode support to an OS, then the kit supplies whatever is missing. The
unicode consortium could even certify a certain level of unicode support
with a formal seal of approval and open the implementation of these kits to
anybody who wants to play, as Sun does with Java implementors. It's just
another example of priming the pump--a powerful marketing activity in the
standards-dominated computing industry.

[As always, whenever I go out on a limb and make suggestions like these,
part of what I suggest turns out to be just plain foolish, part is a good
idea and is well underway (unbeknownst to me), and occasionally there is a
bit left over which is of some value. In the hopes that the last category
exists, I'll go ahead, hit the send button and submit my $0.02.]

__Glen Perkins__
Native Guide Software

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT