RE: accessing extended ranges

From: Eric Mader (mader@jtcsv.com)
Date: Wed Mar 27 2002 - 14:36:47 EST


At 08:38 AM 3/26/2002, Addison Phillips [wM] wrote:
>The downside is that the GUI stuff, Swing and AWT, don't recognize
>surrogates properly. Paste U+D800 U+DC00 into a Swing control and you'll
>see TWO hollow boxes, not one... the JDK is rendering the characters
>separately. (NB> I haven't tried this test with 1.4, so there may be more
>support there for surrogates).
>
>So, using ICU you can probably do some of the processing you're interested
>in. But GUI apps are going to be very problematic until Swing or AWT are fixed.

JDK 1.4 can render characters coded as surrogate pairs. This works in AWT
and Swing.

>Hope that helps.
>
>Addison
>
>Addison P. Phillips
>Globalization Architect / Manager, Globalization Engineering
>webMethods, Inc. 432 Lakeside Drive, Sunnyvale, CA
>+1 408.962.5487 (phone) +1 408.210.3659 (mobile)
>-------------------------------------------------
>Internationalization is an architecture. It is not a feature.
>
>
> > -----Original Message-----
> > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
> > Behalf Of Ben Monroe
> > Sent: 2002年3月26日 0:17
> > To: Unicode list
> > Subject: accessing extended ranges
> >
> >
> > I would like to access some of the characters from "CJK Unified Ideographs
> > Extension B." These are all in the range of 20000-2A6DF. (direct link:
> > http://www.unicode.org/charts/PDF/U20000.pdf )
> >
> > "Basic Latin" appears in 0000-007F range. The original "CJK Unified
> > Ideographs" all appear within the 4E00–9FAF range. These are all easy to
> > access with U+xxxx (4 x's). In Java, the format /uxxxx works just
> > fine (and
> > also the same for http://www.macchiato.com/unicode/ ). However, how do you
> > access the characters in the larger ranges (ie, U+xxxxx or /uxxxxx)?
> >
> > Directly using the 5 value format /uxxxxx produces are Unicode character
> > followed by the 5th x. Here is a quick example:
> >
> > public class UniStringTest {
> > static public void main(String[] args) {
> > String s1 = "\u963F"; // displays fine; standard /uxxxx (4x's)
> > System.out.println(s1);
> > String s2 = "\u9FA0"; // also displays fine; standard /uxxxx (4x's)
> > System.out.println(s2);
> > String s3 = "\u2A6A5"; // biggest character that I know (5x's) but
> > doesn't process
> > System.out.println(s3);
> > }
> > }

Note that the Java "\u" notation always uses four digits. The last string
in your code is interpreted as the character U+2A6A followed by "5"
(U+0035). The correct way to write this in Java is to use surrogate paris:

         String s3 = "\uD869\uDEA5"; // surrogate pair for U+2A6A5

> > Thanks,
> >
> > Ben Monroe

Eric Mader
IBM GCoC San José
5600 Cottle Rd M/S 50-2/B11
San Jose, CA 95193

> >
> >
> >



This archive was generated by hypermail 2.1.2 : Wed Mar 27 2002 - 15:25:16 EST