Re: UTF-16 and HTML META charset

From: Piotr Trzcionkowski (ptrzcionkowski@famur.com.pl)
Date: Sat Feb 19 2000 - 14:54:39 EST


From: "Erik van der Poel" <erik@netscape.com>

> > all my pages in utf-16 have a proper meta declaration.
>
> HTML's META charset does not work for non-ASCII-based character
> encodings, such as UTF-16, UCS-4 and EBCDIC.

It "works" the same with ANY declaration in body :-))))

Web browser download sufficient count of bytes (even whole content) for identify encoding and use detected to translate rest of content. As you wrote later, better way is declaring encodings before downloading body to translate on the fly.

Internet package from your organization Netscape even ignore meta declaration in body. From this reason it's useless for coding in any 8-bits standard in multinational environments like internet. Fortunately it looks for bom code.

>Some browser versions may
> autodetect UTF-16 based on the presence of zero-valued octets and/or the
> BOM.

May ? Did you found a browser which support Unicode and don't check bom code ?

>The most standard way to declare UTF-16 in HTTP is via
> Content-Type.

Of course, but i have no access to change header. I must inform about encoding on body.

And I'm not sure that some "intelligent" admin won't use it for blocking on proxy cache :-))) as done on polish Usenet with utf-8.

>For example:
>
> Content-Type: text/html; charset=UTF-16BE
>
> See the following for the definition of UTF-16BE:
>
> http://www.ietf.org/internet-drafts/draft-hoffman-utf16-05.txt

It's rather proposition for declaring direction of bytes than 16-bits stream which is main dilemma.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT