Re: browsers and unicode surrogates

From: jshin@mailaps.org
Date: Mon Apr 22 2002 - 12:19:15 EDT


On Fri, 19 Apr 2002, Tom Gewecke wrote:

> > With BOM at the beginning, Netscape 4.x, Netscape 6.x/Mozilla and MS
> >IE 5.x/6.x can handle them without much problem except that support
> >for characters above BMP varies from browser to browser as Tex tried to
> >demonstrate in his test pages.

  The autodetection of UTF-16(LE|BE) and UTF-32(LE|BE) doesn't seem
to be as robust as I thought. I should have conducted more
extensive experiment. I just tried UTF-16LE and UTF-16BE pages
with BOM (http://jshin.net/i18n/utftest/bom_utf16le.html and
http://jshin.net/i18n/utftest/bom_utf16be.html) at the beginning with
Netscape 6/Mozilla, MS IE 6 and Netscape 4.x. It appeared to work fine
but when I reloaded the page or went back and forth. Sometimes, the
autodetection doesn't work as well as I thought.

> Thanks for the info! Do you know of any other utf-16 pages on the web for
> testing? I did a lot of searching and could not find any (except a case
> where the links were broken).

  As you wrote and I agreed, it's not so good an idea to put up a web
page in UTF-16/UTF-32 and that's why you can't find any.

> I'm using Mac OS X and it can read utf-16 ok normally, but not Texin's,
> perhaps because of the "endianness." I believe his page is LE and utf-16
> html should be BE. But my understanding of that issue is VERY limited...

  Well, recently Mark et al went to a great length on the issue....
Anyway, I just put up a set of test pages. There are 20 combinations:

  - BOM or BOMless
  - Big endian or Little endian
  - UTF-16 or UTF-32
  - If the http server emits C-T type header with
    MIME charset parameter as below:

      Content-Type: text/html; charset=UTF-32LE

    and if so, whether or not with 'BE|LE' at the end of MIME charset
    name.

  My test pages don't have yet characters beyond BMP(I just
recycled a page I made a long time ago for Korean testing) . I may later
add them. (Tex, can I use your sample page? I'd rather put up a page
with some content instead of just a list of characters.)

   Jungshik Shin



This archive was generated by hypermail 2.1.2 : Mon Apr 22 2002 - 12:56:20 EDT