Re: Strange Browser Behavior

From: Simon Montagu (smontagu@smontagu.org)
Date: Mon Jan 08 2007 - 11:01:36 CST


Tom Gewecke wrote:
> Recently I found my browser (Firefox 2 Mac) converting a numerical
> character reference outside the Unicode code space into a sequence of
> Unicode codepoints, instead of simply "not defined." I would be
> grateful if anyone could tell me whether the Win Mozilla/Firefox or IE 7
> browsers do this too. A test page is here:
>
> http://homepage.mac.com/thgewecke/2297277.html
>
> It looks to me like the sequence being produced in this particular case
> is actually FEFF E083 DDBD. If anyone has an idea about what kind of
> process could translate � into that sequence or its UTF-8 or
> binary equivalent, I'd be interested to hear it.

The process for translating a UTF-32 codepoint into a UTF-16 surrogate
pair, without first checking whether the UTF-32 codepoint is within the
legal range:

(0x230DBD - 0x10000) / 0x400 + 0xD800 = 0xE083
(0x230DBD - 0x10000) % 0x400 + 0xDC00 = 0xDDBD

On Windows Firefox 2 this is what I see on your test page. I'm not sure
why the sequence is preceded by a BOM for you.

For the record, the bug of not checking the original value was fixed
some time ago, and on a nightly build of Firefox I see a single U+FFFD
character.



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:55:40 CST