RE: How will software source code represent 21 bit unicode charac ters?

From: addison@inter-locale.com
Date: Thu Apr 26 2001 - 12:23:25 EDT

Next message: Paul Deuter: "RE: Unicode in a URL"
Previous message: William Overington: "Re: Tags and the Private Use Area"
In reply to: Mike Brown: "RE: How will software source code represent 21 bit unicode charac ters?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Mon, 23 Apr 2001, Mike Brown wrote:

> A char corresponds to a Unicode value -- a UTF-16 code value, which could
> either represent a Unicode character or one half of a surrogate pair. In the
> latter case, it would take a sequence of two "char"s to make one Unicode
> character. It is my understanding that Java's character encoding/decoding
> mechanisms can handle this sort of thing already. However, this is not
> obvious when looking at any Java platform documentation.
>
Actually, Java currently doesn't handle surrogate characters as anything
other than individual code points. You can blithely use an unpaired
surrogate and Java won't complain. Similarly, there is no way to access
the Unicode Scalar Value or any of the character attributes referred to by
a (valid) surrogate pair [which shouldn't be surprising, if you consider
that current JREs reflect an older Unicode standard in which no characters
are actually assigned "out there in the ethereal planes" and thus there
is no character information _to_ access].

And the Java platform documentation is quite explicit about how
Unicode encodings are handled internally: you aren't supposed to
know! Each JRE can choose its own course. In fact, from John O'Conner's
presentation at TUC last fall, I suspect that the char == int relationship
in Sun's environment will be supplanted by a 32-bit representation for
the char datatype, while the String object will remain UTF-16
internally. How well this works and what this breaks remains to be
seen. Brian Beck is supposed to make a presentation on Unicode 3.0 support
in JDK 1.4 at JavaOne this year which should be quite interesting in this
regard.

Best Regards,

Addison

Addison P. Phillips
Globalization Architect
webMethods, Inc.

Next message: Paul Deuter: "RE: Unicode in a URL"
Previous message: William Overington: "Re: Tags and the Private Use Area"
In reply to: Mike Brown: "RE: How will software source code represent 21 bit unicode charac ters?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT