Re: Unicode Support in Java

From: Glen C. Perkins (Glen.Perkins@NativeGuide.com)
Date: Fri Mar 22 1996 - 19:22:23 EST


>>We are having trouble with the display of Unicode strings from
>>programs written in Java. We were encouraged by the fact that
>>characters and strings in Java are Unicode, and expected Java
>>programs to be able to display the characters when running on
>>Windows NT, which has extensive support for Unicode (for example
>>Notepad on Windows NT can display and edit Unicode files).
>>
>>However, when running Java applets in the applet viewer or in
>>Netscape Navigator 2.0, the most significant byte of each Unicode
>>value is ignored, and the characters are all displayed as Latin-1.
>
>The Java language handles Unicode characters just fine. However, the Java
>language libraries (which display text on the screen, etc.) truncate
>everything to ISO 8859-1 right now. Javasoft is aware of this. Both they
>and Netscape need to do some work to support Unicode properly. Remember,
>Java and Netscape are cross-platform products, so they are trying to find
>solutions that work on all platforms, not just NT. It's considerably
>harder on platforms with no Unicode support.
>
>
>David Goldsmith

As David says, the basic Java language specifications are not the problem.
Unicode is the native character encoding for all source files, "char"s,
Strings (constant string type) and StringBuffers (mutable string type), and
both UCS-2 and UTF-8 are directly supported (and preferred) for all
datastream I/O, both locally and over the net via URLs.

The current Sun implementation lags the specs considerably, unfortunately.
Despite the fact that all source files are required to be written in
unicode or 'convertable to unicode', and your variable names could be kanji
and your class names hangul, the only source encoding currently supported
is ISO 8859-1. If you try to insert a non-Latin unicode char via the
built-in escape mechanism:

String fatChanceStr = "\u1234 is an Ethiopian character";

and then try to do anything with it, it will almost certainly get lost,
mangled, or truncated along the way, either by an incompletely implemented
standard class (such as java.io.PrintStream) or by something else en route
from binary encoding to rendered pattern of pixels on the screen.

The minor potholes will be filled in gradually, but Java still desperately
needs two things:

1) Java, like all unicode-based systems, needs more unicode support from
the operating systems on which it runs, and

2) Java needs a platform-independent font system with provision for
just-in-time delivery of the specific glyph definitions needed to render a
particular selection unicode-encoded text.

At the heart of the furor over Java is its promise to be able to deliver
safe executable code on a just-in-time basis that will run identically on
*any* platform including an entire menagerie of handheld devices still on
the drawing boards. Fundamental to that task is the ability to deliver
various resources needed by the program: sound, pictures, animations, video
clips and...glyphs that will perform identically (ideally) on all
platforms. Even if your PC or PDA is unicode-savvy, I can't assume that you
have pre-installed the correct old Gothic German font or the cartoonish
Hangul font that my program needs in order to work correctly, any more than
I can assume that you already have a picture of the Swiss Alps installed in
your system.

Java needs the ability to ask your system if it already has a specific
font, and if not, if it has a font of a certain type containing the
necessary glyph definitions (if that's acceptable), and if not, to deliver
the specific glyphs (usually not a complete font) necessary to display what
it wants to display.

Fortunately, HTML has exactly the same needs for platform-independent fonts
that allow for just-in-time glyph delivery, and HTML is even harder to
ignore than Java. Adobe, Netscape and Apple just announced a technology of
this sort based on Adobe's Portable Document Format/"Amber" technology.
Unfortunately, I haven't been able to find out from Adobe whether this
technology will apply to any but single-byte fonts. "Acrobat," the
precursor to Amber has such a limitation (to my knowledge.)

In discussions with the Java team at Sun, when I asked them whether they
were going to incorporate a version of this technology into the "Java font
system" they obviously need, they wouldn't say any more than "we're talking
to the people you would expect us to be talking to in order to solve this
problem." I also submitted a written proposal for a Java syntax that would
support a hierarchy of "if you have this specific font use it; if not, (and
if you can be flexible) use any font in such-and-such category containing
the needed glyph definitions; otherwise, download glyphs from this URL:
http://www...; and if you can't, throw a missingFont exception", but it
appears to have disappeared into a black hole at the center of Sun. ;-)

I'll be talking to Arthur van Hoff (until recently, one of the principal
creators of Java) in a few days, and I'll try to find out more. He was less
than enthusiastic about Java's original decision to go with unicode the
last time I spoke with him and seemed to consider unicode quite an
unnecessary burden. I also spoke to one of his colleagues who told me that
the Java Team "wasn't exactly overflowing with unicode expertise" and
didn't quite know where to begin solving some of the problems. Therefore,
he said, "unless someone wants to step forward and volunteer" to help them
out, they would just put a lot of this "unicode business" on a back burner.
I didn't get the impression that he felt he was getting much help from the
Unicode Consortium, and that he could have used it.

I would like to add that Java is potentially the best thing that has
happened to Unicode since the creation of the Unicode Consortium, and I am
surprised at how little effort I have seen to hitch unicode to the rising
star of Java. Among serious Java programmers (at conferences, user groups,
and net newsgroups, for example) the questions regarding how to "display
Japanese or Chinese using unicode" or "what is this 'UTF-8' and when should
I use it?" are non-stop. The unicode-basis of Java inspires them to learn
about unicode and try to use it for the first time to write more
internationally-savvy code. (Then they hit all the potholes, conclude they
never should have tried this new road to begin with, and begin asking for
"workarounds" that will allow them to avoid unicode as much as possible.)

I'm probably just unaware of massive, focused, and coordinated efforts
being made by the "Big Guys" at Sun, Netscape, Microsoft, Apple, Adobe,
Macromedia and others behind closed doors. In case not, though, I hope that
the Unicode Consortium will take full advantage of the current excitement
over Java to provide some assistance to its implementors to make sure that
legions of new Java programmers find unicode to be much more of a solution
than a problem. If so, Java and HTML may become the vehicles that finally
make unicode support a mandatory OS feature.

Glen C. Perkins
CEO, Native Guide Software
http://www.NativeGuide.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:30 EDT