Re: Java and Unicode

From: Elliotte Rusty Harold (
Date: Wed Nov 15 2000 - 09:23:24 EST

One thing I'm very curious about going forward: Right now character
values greater than 65535 are purely theoretical. However this will
change. It seems to me that handling these characters properly is
going to require redefining the char data type from two bytes to
four. This is a major incompatible change with existing Java.

There are a number of possibilities that don't break backwards
compatibility (making trans-BMP characters require two chars rather
than one, defining a new wchar primitive data type that is 4-bytes
long as well as the old 2-byte char type, etc.) but they all make the
language a lot less clean and obvious. In fact, they all more or less
make Java feel like C and C++ feel when working with Unicode: like
something new has been bolted on after the fact, and it doesn't
really fit the old design.

Are there any plans for handling this?


