From: Yung-Fong Tang (email@example.com)
Date: Tue Mar 11 2003 - 19:08:49 EST
Because the following code got apply to your unicode data
1. convert \u to unicode -
three unicode characters-
U+FFE2, U+FF80, U+FF93
This is ok
2. a "Throw away hihg 8 bits got apply to your code" so
it became 3 bytes
E2 80 93
3. and some code treat it as UTF-8 and try to convert it to UCS2 again, so
E2 = 1110 0010 and the right most 4 bits 0010 will be used for UCS2
80 = 1000 0000 and the right most 6 bits 00 0000 will be used for UCS2
93 = 1001 0011 and the right most 6 bits 01 0011 will be used for UCS2
 [00 0000] [01 0011] = 0010 0000 0001 0011 = 2013
U+2013 is EN DASH
so... in your code there are something very very bad which will corrupt
Step 2 and 3 are very bad. You probably need to find out where they are
and remove that code.
read my paper on http://people.netscape.com/ftang/paper/textintegrity.html
Probably your Java code have one or two bugs which listed in my paper.
Jain, Pankaj (MED, TCS) wrote:
>thanks, its working for me now.
>But still I have a doubt that why \uFFE2\uFF80\uFF93 is giving ndash in
>if you have any information on this, than pls let me know.
>From: firstname.lastname@example.org [mailto:email@example.com]
>Sent: Monday, March 10, 2003 7:59 PM
>To: Jain, Pankaj (MED, TCS)
>Subject: Re: Unicode character transformation through XSLT
>Pankaj Jain wrote,
>>My problem is that, I am getting Unicode character(\uFFE2\uFF80\uFF93)
>>from resource bundle property file which is equivalent to ndash(-) and
>U+2013 is the ndash (aEUR"). It is represented in UTF-8 by three
>hex bytes: E2 80 93.
>But, \uFFE2 is fullwidth pound sign
>\uFF80 is half width katakana letter ta
>and \uff93 is half width katakana letter mo.
>Perhaps the reason you see three question marks is that the font
>you are using doesn't support full width and half width characters.
>What happens if you replace your string \uFFE2\uFF80\uFF93 with
This archive was generated by hypermail 2.1.5 : Tue Mar 11 2003 - 19:46:09 EST