From: Philippe Verdy (email@example.com)
Date: Mon Mar 22 2004 - 13:53:16 EST
From: "Stefan Persson" <firstname.lastname@example.org>
> Philippe Verdy wrote:
> > Some browsers will need NCRs, some will accept UTF-8, some will need a
> > "x-user-defined" encoding which is not a standard encoding for use in
> > HTML 3.2...
> Isn't that only the case with non-BMP code points?
I don't know, but for now IE is known to support non-BMP characters only through
NCRs, even in UTF-8 documents, AND only if the "x-user-defined" encoding is
specified (which is a non standard alias of UTF-8 with special behavior to
select a specific user-defined default font instead of using the per-script
default font) or with manual selection of the "User-Defined" encoding in the
Display menu, and only after patching some registry entries.
However, NCRs are sometimes the only way to display non Latin characters in some
browsers, as they rely only on the user's locale to get the language and its
As far as I know, Unicode is the only alternative to ISCII for Indian Brahmic
Scripts, however Urdu written with the Arabic script may be supported with the
Arabic ISO8859 charset.
UTF-8 is very viable for now for all scripts that can be restricted to the BMP.
UTF-8 support out of the BMP is often bous or inexistant in too many browsers...
(I did not try, but IE may support the CESU-8 or UTF-16 encoding to get
characters out of the BMP, because UTF-16 is the encoding used internally within
character from a non-BMP codepoint value such as 0x10FFFD, and reading the code
of the first "character" of that string.
same test with 0xD800 and 0xDC00.
Depending on this result, you may then be able to display strings containing
non-BMP characters, but still provided that the user manuall selects the
"User-Defined" encoding in the IE's display menu or the page is sent with an exp
licit header specifuing the charset: "Content-Type: text/html;
This archive was generated by hypermail 2.1.5 : Mon Mar 22 2004 - 14:35:42 EST