Re: java and unicode

From: Adrian Havill (havill@threeweb.ad.jp)
Date: Tue Feb 03 1998 - 20:44:12 EST


> Well... I always get a little rambunctious when people talk about some
> programming language spec supporting or not supporting i18n or bidi or
> whatever.

> Java is just a programming language
Not true, unless you work for a certain software (they do make nice mice and
joysticks, too) firm in Redmond, Washington. It's an environment, but that
discussion is best left to a Java oriented forum.

> (not manna from heaven as some
> people appear to believe).
True.

> So why should a programing language spec talk
> about anything other than the syntax and semantics of the language itself?

As parts of a programming language contain labels, and comments that are
intended for humans, these parts DO need to talk about I18N. As the
"character/string" is a basic element of most languages, and a character usually
holds information that is human language related, this too needs to be
specified. In Java, that would be the 16-bit Unicode char, the 16-bit Unicode
escape sequence for use in source code, and the mechanism for handling Unicode
in class names (although specified, I have seen ZERO implementations that
actually deal with this), classifying what characters are considered to be
"numerals" and what are "digits" and how case sensitivity is handled, and what
set of Unicode characters make up a identifier.

By just specifying what characters can be included in the source code, Unicode
has opened up wide doors for developers who have excellent skills but do not use
a ASCII compatible system and/or are comfortable with English.

There are still a few old compilers out there that will throw errors if
the comments or string literals are not in the ASCII printable range, forcing
programmers to either use English or use an awkward "romanized" version of their
language, because the spec did not talk about what happens in cases of non-ASCII
in the source code.

The definition in Standard C for wide character support was left so open as to
be almost useless for portable work... ie the minimum width for wchar_t (not
necessarily 16-bits), etc.

As what the source code and identifiers is usually part of the spec, their
relationship to I18N needs to be defined.

> [snip]
> Support for BIDI has nothing to do with programming language specs.

It's true that BIDI support has more to do with the libraries than the specs
(although the specs do need to address things such as how BIDI is handled in
identifiers, etc., for the sake of comparison and processing,).

90% of the Unicode stuff in Java (character conversion, etc.), can be
implemented in Java/C by the programmer itself. But why force the programmer to
re-invent the wheel? If it weren't for certain firms (certain MS operating
systems) virtually FORCING Unicode on them, all but the largest firms (which
have an eye towards exporting their software and keeping the development
time/costs for localized versions down) would probably ignore Unicode, and I18N.

Or worse, they would try to reinvent the wheel, and mess up. There's a lot of
software out there with "good intentions" that tried to I18N their software and
fell short. By including a stable, working BIDI in the library and making the
library standard equipment, chances improve that people with BIDI needs will get
more BIDI capable software. Why? Because the functionality was already there.

By putting the "extras" (stuff that could be implemented by the programmer
{BIDI, character conversion}) into the standard libraries, a few developers who
thought that BIDI is a specification for connecting music synthesizers will take
a little more careful look at it, and might keep their app open enough to allow
for BIDI support.

Call me a cynic, but it's thanks to "Unicode-support built in" stuff like NT,
CE, Java that have literally -forced- the developer to think "the world does not
revolve around ASCII", that Unicode is where it is today. Without this support,
all but the largest firms would dismiss I18N and Unicode as "something for
places with expensive, specialized, I18N/L10N teams." Unicode would have been
dismissed as an exotic specification that is only useful for some small
specialized applications. "No need for our software exists outside the U.S., and
adding I18N support is too hard/expensive and the market is not worthwhile."

So once Java (1.2?) and NT (5.0?) take the lead and make BIDI "standard
equipment", a great deal of firms would have just ignored BIDI will provide
support for it. Why? Because it was already there-- they didn't have to develop
it (or buy an expensive I18N tool set).

Sure you can support BIDI in COBOL or even FORTRAN. But most apps don't. But I'm
willing to bet that if BIDI was "standard equipment," a lot more would.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT