Re: FW: Algorithm

From: schererm@us.ibm.com
Date: Fri Mar 26 1999 - 10:30:29 EST


hello,

a lot of software writes what is called a unicode "signature" (or "byte
order mark"): the first character is U+feff if the encoding is unicode.
since the unicode encoding form as a byte stream produces different byte
sequences from there, you may have to check for several variants. if you
don't find any of them, then you should assume the default encoding of your
system.

the signature character is serialized into the following byte sequences by
the encoding forms:

UTF-8: ef bb bf
UTF-16BE (big endian): fe ff
UTF-16LE (little endian): ff fe

from there, instantiate an InputStreamReader with the encoding omitted (for
the system default encoding), or with "UTF8", "UnicodeBig", or
"UnicodeLittle".

if you need help, send me another email.

markus

Markus Scherer IBM RTP +1 919 486 1135 Dept. Fax +1 919 254 6430
schererm@us.ibm.com
                        Unicode is here! --> http://www.unicode.org/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT