Re: Byte Order Marks

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Apr 19 2001 - 13:12:16 EDT

Next message: Carl W. Brown: "OT Porting to older OSes was RE: Latin w/ diacritics (was Re: benefits of unicode)"
Previous message: Tomas McGuinness: "Byte Order Marks"
In reply to: Tomas McGuinness: "Byte Order Marks"
Next in thread: Yves Arrouye: "RE: Byte Order Marks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

There is an RFC about UTF-16 that explains this:

If the text is labeled by the protocol as
charset=UTF-16 then the first two bytes are the byte order mark
charset=UTF-16BE then it is big-endian and the first two bytes are just text
charset=UTF-16LE then it is little-endian and the first two bytes are just text

If you don't have any clue about the byte order, but you know it is UTF-16, then assume BE.

Similar for UTF-32[BE/LE].

If you don't know anything about your text, then you may start some heuristics or reject the text...

markus

Tomas McGuinness wrote:
> A quick question relating to the Byte Order Mark of UCS-2. If its absent is
> it safe to assume any particular order (i.e. Big or Little Endian?).

Next message: Carl W. Brown: "OT Porting to older OSes was RE: Latin w/ diacritics (was Re: benefits of unicode)"
Previous message: Tomas McGuinness: "Byte Order Marks"
In reply to: Tomas McGuinness: "Byte Order Marks"
Next in thread: Yves Arrouye: "RE: Byte Order Marks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT