Re: Byte Order Marks

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Apr 19 2001 - 13:12:16 EDT


There is an RFC about UTF-16 that explains this:

If the text is labeled by the protocol as
charset=UTF-16 then the first two bytes are the byte order mark
charset=UTF-16BE then it is big-endian and the first two bytes are just text
charset=UTF-16LE then it is little-endian and the first two bytes are just text

If you don't have any clue about the byte order, but you know it is UTF-16, then assume BE.

Similar for UTF-32[BE/LE].

If you don't know anything about your text, then you may start some heuristics or reject the text...

markus

Tomas McGuinness wrote:
> A quick question relating to the Byte Order Mark of UCS-2. If its absent is
> it safe to assume any particular order (i.e. Big or Little Endian?).



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT