Re: BOM

From: Yung-Fong Tang (ftang@netscape.com)
Date: Fri Jun 25 1999 - 20:45:43 EDT


It is interesting that they mention " (charset=UTF-16)," but I cannot
find UTF-16 from
ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets

Dave, Arnaud, and Ian: Since you folks are the editors of HTML 4.0
Specification . Could you submit IANA registration for charset UTF-16 ?

Or someone from unicode consortium should do that ?

Bob Jung wrote:

> The W3C recommends big-endian
> (http://www.w3.org/TR/REC-html40/charset.html):
>
> When HTML text is transmitted in UTF-16 (charset=UTF-16),
> text data should be transmitted in network byte
> order ("big-endian", high-order byte first) in accordance
> with [ISO10646], Section 6.3 and [UNICODE], clause
> C3, page 3-1.
>
> Furthermore, to maximize chances of proper interpretation,
> it is recommended that documents transmitted as
> UTF-16 always begin with a ZERO-WIDTH NON-BREAKING SPACE
> character (hexadecimal FEFF, also
> called Byte Order Mark (BOM)) which, when byte-reversed,
> becomes hexadecimal FFFE, a character
> guaranteed never to be assigned. Thus, a user-agent
> receiving a hexadecimal FFFE as the first bytes of a text
> would know that bytes have to be reversed for the remainder
> of the text
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT