RE: Java 1.0.2 Native2ASCII

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon May 04 1998 - 21:13:51 EDT


Mike,

>
> The information I gathered about Java 1.0.2 is located in the JDK at
> http://java.sun.com/docs/books/jls/html/javalang.doc4.html. This indicates
> the version of Unicode used by Java 1.0.2.

O.k. Now I get it. javalang.doc4.html refers to a table it claims is at:

ftp://unicode.org/pub/MappingTables/UnicodeData-1.1.5.txt

That URL does not exist and hasn't for some time now.

UnicodeData-1.1.5.txt was the 5th revision of the Unicode Character Database
associated with Unicode 1.1. It presumably did reside at that URL for
awhile, but it was later replaced with UnicodeData-1.2.3.txt (which, however,
still referred to Unicode 1.1, but with some corrections in the data file).

In any case, it is technically incorrect for javalang.doc4.html to be
referring to "Unicode 1.1.5" as if it were a version of the standard
itself.

>
> I am aware that Hangul was changed in Unicode 2.0, but was unable to find
> definitive information on whether Eastern/Western European or Cyrillic
> characters were remapped. With a quick test of Native2ASCII that comes with
> JDK 1.1.6 I produced some serious remapping of Cyrillic characters.

There was absolutely no moving or deletion of Latin or Cyrillic characters
between Unicode 1.1 and Unicode 2.0. There has been, however, massive emendation
of various normative and informative properties associated with characters
between UnicodeData-1.2.3.txt and the currently published UnicodeData-2.0.14.txt:

ftp://www.unicode.org/Public/UNIDATA/UnicodeData-Latest.txt

So if by "remapped", you are referring to case mapping, for instance, then yes,
there were significant emendations to the informative case mappings -- including
most of the corrections claimed in javalang.doc4.html.

If by "remapped", you are referring to mappings between Unicode and other
character sets, then UnicodeData-XYZ.txt, of whichever version, is not
implicated. The Unicode Character Database does no cross-mappings to other
character sets. That data is provided elsewhere, in mapping tables.

If native2ascii shows significant differences (for other than Hangul) between
JDK 1.0.2 and JDK 1.1.6, that must be the result of different tables
implemented down underneath sun/io/ByteToCharXXX.class definitions, and is
not the result of an encoding difference between Unicode 1.1 and Unicode 2.0
for Latin or Cyrillic characters.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT