There is an RFC about UTF-16 that explains this:
If the text is labeled by the protocol as
charset=UTF-16 then the first two bytes are the byte order mark
charset=UTF-16BE then it is big-endian and the first two bytes are just text
charset=UTF-16LE then it is little-endian and the first two bytes are just text
If you don't have any clue about the byte order, but you know it is UTF-16, then assume BE.
Similar for UTF-32[BE/LE].
If you don't know anything about your text, then you may start some heuristics or reject the text...
markus
Tomas McGuinness wrote:
> A quick question relating to the Byte Order Mark of UCS-2. If its absent is
> it safe to assume any particular order (i.e. Big or Little Endian?).
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT