Supplementary Characters in the Java(TM) Platform

John O'Conner - Sun Microsystems, Inc.

Intended Audience: Software Engineers
Session Level: Intermediate

This paper describes how supplementary characters are supported in the Java(TM) platform. Supplementary characters are characters in the Unicode standard whose code points are above U+FFFF, and which therefore cannot be described as single 16-bit entities such as the char data type in the Java programming language. Such characters are generally rare, but some are used, for example, as part of Chinese and Japanese personal names, and so support for them is commonly required for government applications in East Asian countries.

The Java platform is being enhanced to enable processing of supplementary characters with minimal impact on existing applications. New low-level APIs enables operations on individual characters where necessary. Most text-processing APIs, however, uses character sequences, such as the String class or character arrays. These are now interpreted as UTF-16 sequences, and the implementations of these APIs is changed to correctly handle supplementary characters.

Besides explaining these enhancements in detail, the paper also provides guidelines for application developers for determining and implementing necessary changes to enable use of the complete Unicode character set.