RE: Problems converting from UTF-8 to UCS-2 and vice-versa using JRun 3.1, SQL Server 2000, Windows 2000 and Java 3.1

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Thu Sep 12 2002 - 07:51:25 EDT

Previous message: William Overington: "Re: ISRI SoEuro has just been created!!"
Next in thread: Marco Cimarosti: "RE: Problems converting from UTF-8 to UCS-2 and vice-versa using JRun 3.1, SQL Server 2000, Windows 2000 and Java 3.1"
Reply: Marco Cimarosti: "RE: Problems converting from UTF-8 to UCS-2 and vice-versa using JRun 3.1, SQL Server 2000, Windows 2000 and Java 3.1"
Reply: Addison Phillips [wM]: "RE: Problems converting from UTF-8 to UCS-2 and vice-versa using JRun 3.1, SQL Server 2000, Windows 2000 and Java 3.1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe de Rochambeau wrote:
> On the other hand, if I store the previous "go" character
> plus an unusual
> CJK ideogram whose Unicode equivalent is \u5439 (E5 90 B9 in UTF-8)
> in the DB and retrieve the data, JRun 3.1 will only display the first
> character in my form's textarea, plus a few invisible
> characters, and the
> database will contain the following hex values:
>
> E8 AA 9E E5 3F B9 20 20 20 20 20 20 0D 0A 0A
>
> As you can see, "go" is still there, but the following
> character (E5 3F B9)
> is not \u5439 (E5 90 B9). I cannot figure out how to fix this problem.
>
> Any help with this problem would be much appreciated.

I see what the problem is. As usual, it's all the fault of Bill Gate$. :-)

If you interpret <E5, 90, B9> according to Windows-1252, you see that E5 is
"å", B9 is "¹", but 90 is an unassigned slot! Unassigned characters are
normally turned into a question marks, and "?"'s code is (guess what) 3F...

<E8, AA, 9E> this works only by chance, because all three bytes are valid
Windows-1252 characters: "é", "ª", and "ž", respectively.

I guess that the problem starts when you try to fool the system into
thinking that the text is ISO 8859-1:

byte[] byt = (newQfLibelleArray[i]).getBytes( "ISO8859_1" );
String tempUtf16 = new String( byt );

But, sorry. I can't help with a fix, because I don't know Java API's well
enough.

Can't you do something like <.getBytes("UTF-8")>? Or, even better, doesn't
(newQfLibelleArray[i]) have a method to return a <String> object directly?

_ Marco

Previous message: William Overington: "Re: ISRI SoEuro has just been created!!"
Next in thread: Marco Cimarosti: "RE: Problems converting from UTF-8 to UCS-2 and vice-versa using JRun 3.1, SQL Server 2000, Windows 2000 and Java 3.1"
Reply: Marco Cimarosti: "RE: Problems converting from UTF-8 to UCS-2 and vice-versa using JRun 3.1, SQL Server 2000, Windows 2000 and Java 3.1"
Reply: Addison Phillips [wM]: "RE: Problems converting from UTF-8 to UCS-2 and vice-versa using JRun 3.1, SQL Server 2000, Windows 2000 and Java 3.1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Thu Sep 12 2002 - 08:42:23 EDT