Re: FW: Algorithm

Date: Fri Mar 26 1999 - 10:30:29 EST


a lot of software writes what is called a unicode "signature" (or "byte
order mark"): the first character is U+feff if the encoding is unicode.
since the unicode encoding form as a byte stream produces different byte
sequences from there, you may have to check for several variants. if you
don't find any of them, then you should assume the default encoding of your

the signature character is serialized into the following byte sequences by
the encoding forms:

UTF-8: ef bb bf
UTF-16BE (big endian): fe ff
UTF-16LE (little endian): ff fe

from there, instantiate an InputStreamReader with the encoding omitted (for
the system default encoding), or with "UTF8", "UnicodeBig", or

if you need help, send me another email.


Markus Scherer IBM RTP +1 919 486 1135 Dept. Fax +1 919 254 6430
                        Unicode is here! -->

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT