RE: UCS-2 and UTF-16

From: Rick Cameron (Rick.Cameron@crystaldecisions.com)
Date: Fri Sep 13 2002 - 12:29:07 EDT


#2 is wrong - but perhaps just a typo.

If you have an array of bytes that contains a UTF-8 string, you would write

String myUCS2String = new String (byteArray, "UTF-8");

To convert it to a Java String. The character set parameter is the name of
the character set you're converting _from_, not _to_.

What is stringInSayUTF8 supposed to be? It cannot be a String object, since
String objects always hold UTF-16 (pace Addison). If you want to work with
UTF-8 in Java, you should hold the UTF-8 data in a byte array.

Cheers

- rick

-----Original Message-----
From: pr1@club-internet.fr [mailto:pr1@club-internet.fr]
Sent: Friday, 13 September 2002 2:45
To: unicode@unicode.org
Subject: UCS-2 and UTF-16

Hello,

thank you for the useful information which you all provided.

However, I am now completely confused.

Let me recapitulate. Please tell me if any of my assertions are wrong.

1) According to the Microsoft Knowledge Base Article number
Q2322580, SQL Server 2000 stores data stored in UCS-2.

2) As far as Java 2 is concerned, UTF-16 is the same as UCS-2. That is,
in order to convert a String in a particular encoding (e.g. UTF-8), you
would use the following String constructor.

String myUCS2String = new String ( stringinSayUTF8.getBytes(),
"UTF-16" );

or

String myUCS2String = new String ( stringinSayUTF8.getBytes() ); since
UTF-16 is Java's default encoding.

3) Since JRun 3.1 uses the ISO8859_1 charset to pass parameters via
HTML headers, you must retrieve bytes using that encoding, convert
thoses bytes to UTF-16 before storing them in the SQL Server 2000
database:

byte[] byt = newFaqLibelle.getBytes( "ISO8859_1" );

String newFaqLibelleIsoVersUtf = new String( byt, "UTF-16" );

or

String newFaqLibelleIsoVersUtf = new String( byt );

<store in DB>

4) To retrieve multiple-byte characters from a SQL Server 2000 DB, you
must convert them back to UTF-8 as follows:

out.println( new String( myDataFromDB.getBytes(), "UTF-8" ) );

5) Since some JBDC drivers use the OS's default charset (Cp1252 in my
case), the above conversions are totally USELESS. I surmise that my
JDBC version is not nvarchar compatible. How can I find that out?
Unfortunately, you can't just type "jdbc -v" on Windows to find the jdbc
version.

What is strange is that the JRun 3.1 EJBs that were developed by my
predecessors store and retrieve Asian characters from the database
without any problems, whereas I am having lots of problems using JSPs,
JavaBeans + JDBC. Does anyone have any explanation for this?

Many thanks.

Best regards,

Philippe de Rochambeau



This archive was generated by hypermail 2.1.2 : Fri Sep 13 2002 - 13:03:33 EDT