Re: Java and Unicode

From: John O'Conner (john.oconner@eng.sun.com)
Date: Tue Nov 14 2000 - 14:15:58 EST


You can currently store UTF-16 in the String and StringBuffer classes. However,
all operations are on char values or 16-bit code units. The upcoming release of
the J2SE platform will include support for Unicode 3.0 (maybe 3.0.1)
properties, case mapping, collation, and character break iteration. There is no
explicit support for surrogate pairs in Unicode at this time, although you can
certainly find out if a code unit is a surrogate unit.

In the future, as characters beyond 0xFFFF become more important, you can
expect that more robust, official support will ollow.

-- John O'Conner

Jani Kajala wrote:

> As Unicode will soon contain characters defined beyond the code point range
> [0,65535] I'm wondering how is Java going to handle this?
>
> I didn't find any hints from JDK documentation either, at least a few days
> ago when I browsed the Java documentation about internationalization I just
> saw a comment that 'Unicode is a 16-bit encoding.' (two errors in one
> sentence)
>
> Regards,
> Jani Kajala



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT