Re: Java char and Unicode 3.0+ (was:Canonical equivalence in rendering: mandatory or recommended?)

From: John Cowan (
Date: Wed Oct 15 2003 - 15:18:05 CST

Philippe Verdy scripsit:

> [...] char, whose values are 16-bit unsigned integers
> representing Unicode characters (section 2.1).

Despite your ingenious special pleading, I don't see how this can mean
anything except that chars must be 16-bit unsigned integers.

> The Java language still lacks a way to specify a literal for a character out
> of the BMP. Of course one can use the syntax '\uD800\uDC00' but this would
> not compile with the current _compilers_, that expect only one char in the
> literal. In a String literal "\uD800\uDC00" becomes the 4-bytes UTF-8
> sequence for _one_ Unicode codepoint in the compiled class.

Character literals are crocky anyhow. IMHO modern programming languages
should not have a Character type, but deal only in Strings.

> 2. The initial spec of UTF-32 and UTF-8 by ISO allowed much more planes with
> 31-bit codepoints, and may be there will be an agreement sometime in the
> future between ISO and Unicode to define new codepoints out of the current
> standard 17 first planes that can be safely converted with UTF-16,

I doubt it very much. 17 planes is waaaay more than sufficient.

John Cowan
Assent may be registered by a signature, a handshake, or a click of a computer
mouse transmitted across the invisible ether of the Internet. Formality
is not a requisite; any sign, symbol or action, or even willful inaction,
as long as it is unequivocally referable to the promise, may create a contract.
       --_Specht v. Netscape_

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST