Re: Unicode in Java

From: Glen C. Perkins (glen@MediaCity.com)
Date: Mon Sep 09 1996 - 01:33:24 EDT


unicode@Unicode.ORG wrote:
>
> >Can you give me a reference for this on the Web? What subset of Unicode
> >characters is supposed to be supported now? When did the change become
> >official? Are there any implementations of the new version? I have been
> >going through the Java documentation at Sunsoft, which refers to Unicode
> >character strings in several places, but gives no information that I can
> >find on the relevant version of Unicode.
>
> The Java JDK 1.0.2 implementation from Javasoft truncates everything to
> ISO-8859-1 when drawing strings, because the implementation couldn't
> handle anything else. This is mentioned in the Java 1.0.2 API
> specification, which is available both at Javasoft's web site
> (http://www.javasoft.com/) under the "Developer" section, and as a book.
> The file I/O stuff that reads and writes UTF-8 work correctly, though.
>
> This will be fixed in the JDK 1.1 release from Javasoft. Also, Netscape
> Navigator 3.0 does a much better job now. It handles most of the scripts
> the host OS supports now. The Mac version messes up if you have both
> Chinese and Japanese installed; it doesn't handle the non-Japanese
> Chinese characters correctly. Still, it handles Latin, Cyrillic, Symbols,
> Japanese, and the compatibility area pretty well.
>
> It's hard to implement Unicode rendering on platforms that don't support
> Unicode directly, so it's taking a while for vendors to implement this
> correctly.
>
> David Goldsmith
> International Software Architect
> Apple Computer, Inc.
> goldsmith@apple.com

There are a lot of single-byte assumptions in various text handling
routines in the standard libraries as implemented by Sun in 1.0.2, as
David says. I haven't been able to get anyone at Javasoft to make any
concrete statements such as "we've tested the entire 1.1 base API using
Japanese [or Chinese, etc.] text and found all methods
(StringTokenizer() etc.) that failed and fixed them for 1.1."

All they have been able (or willing) to say to me so far was that there
were a number of improvements in the base API implementations that would
"help with localization issues." They have also admitted to being under
a great deal of pressure from "very important Japanese parties" to
eliminate all single-byte text assumptions everywhere in the Sun
implementations of the base classes.

Symantec, however, is currently shipping their Japanese version of Cafe
(Java development environment) and they claim that their unicode support
is a lot more complete than Sun's. So much so, in fact, that they claim
you can feel free to do CJK Java development in the US-English version
of Cafe and it will work just fine. I haven't tested this claim, but I'm
glad to hear them at least claim it.

__Glen Perkins__



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT