Re: MES as an ISO standard?

From: Adrian Havill (
Date: Wed Jul 02 1997 - 22:28:10 EDT

Unicode Discussion wrote:
> It's not a low priority for Japanese programmers.

I'm glad someone pointed this out. Although C (and Java) keywords are in
English (this is debatable (^_^)), a programmer of any native language
can memorize a handful of reserved words, etc.. Just because you're a
programmer doesn't mean you can read/write English.

I can attest that every programmer I've met in Japan usually has a
English to Japanese dictionary by their terminal (or online in a
window), so they can understand source code with variable names like

A quick glance at Dr. Jobbs, Javaworld, C Programmer, (the Japanese
versions of the mags) shows that articles and source code are being
written all the time to take advantage of the fact that "native language
source has arrived."

> With the new Japanese JDK, Japanese programmers can write their source
> code in Shift-JIS, using the real Japanese names for things, and the
> compiler will convert the identifiers into UTF-8 in the class files.
> Very convenient.

Still has a long way to go, though. Even in "improved
internationalization" JDK 1.1.2, package names and class names must
still be 8-bit-- the compiler (Sun's Solaris version, at least) lops off
the top 8 bits of every Unicode character before writing the class to
disk. If you localize JDK 1.1.1 to Japanese, it gets worse. The Unicode
class name is converted to EUC-JP (in the case of Solaris). However, all
characters it can't convert in a class/package name get mapped to the
"question" mark, creating a many-to-one mapping.

Not to mention that classloaders have no idea how to load non-ASCII
names (especially via HTTP-- I know this isn't a Java problem, but a
HTTP protocol one). The Win32 JDK does not take advantage of NTFS (I've
heard that the extended FAT as well can partially support Unicode based
file names) Unicode capable file names. The many suggestions in the
"Java Specification" for mapping Unicode package names (using "@" plus a
4-character hex notation), etc, still seen to be ignored by current

Plus, the convert from native to escape codes then compile step is a
hassle-- esp. because the tools such as native2ascii (Unicode to
whatever) are not very robust and don't handle errors gracefully... they
quietly "give up" and stop converting without a diagnostic whenever
encounting anything it doesn't understand/can't convert.

Get an error in the compiler, and you get a message "error in class
\ublah\ublah\ublah" (unless you're using a Japanese localized JDK). You
should be able to specify the conversion for compiler output.

The compiler should also accept at least UTF-8 by default (and perhaps
UTF-7 and UTC-2 as a command line option) straight (as well as be able
to specify the output character set) in addition the traditional
backslash-u Unicode escape code notation, so that those developers with
Unicode editors can bypass the "charset convert" cycle of the compile.
This is a relatively simple addition to the compiler that would be
backward compatible with ASCII source files.

Javasoft engineers reading this ML, are you listening? :-)

Adrian Havill <URL:>
Engineering Division, System Planning & Production Section

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT