Re: Java and Unicode

From: John O'Conner (john.oconner@eng.sun.com)
Date: Wed Nov 15 2000 - 18:02:55 EST


Jungshik Shin wrote:

> That's exactly what I have in mind about Java. I can't help wondering why
> Sun chose 2byte char instead of 4byte char when it was plainly obvious
> that 2byte wouldn't be enough in the very near future. The same can be
> said of Mozilla which internally uses BMP-only as far as I know.
> Was it due to concerns over things like saving memory/storage, etc?

Yes. If you have been involved with Unicode for any period of time at all, you
would know that the Unicode consortium has advertised Unicode's 16-bit
encoding for a long, long time, even in its latest Unicode 3.0 spec. The
Unicode 3.0 spec clearly favors the 16-bit encoding of Unicode code units, and
the design chapter (chapter 2) never even hints at a 32-bit encoding form. The
Java char attempts to capture the basic encoding unit of this 16-bit, widely
accepted encoding method. I'm sure the choice seemed plainly obvious at the
time.

The previous 2.0 spec (and previous specs as well) promoted this 16-bit
encoding too...and even claimed that Unicode was a 16-bit, "fixed-width",
coded character set. There are lots of reasons why Java's char is a 16-bit
value...the fact that the Unicode Consortium itself has promoted and defined
Unicode as a 16-bit coded character set for so long is probably the biggest.

-- John O'Conner



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT