Re: Novice question

From: Philippe Verdy (
Date: Mon Mar 22 2004 - 13:53:16 EST

  • Next message: Peter Kirk: "Re: Novice question"

    From: "Stefan Persson" <>
    > Philippe Verdy wrote:
    > > Some browsers will need NCRs, some will accept UTF-8, some will need a
    > > "x-user-defined" encoding which is not a standard encoding for use in
    > > HTML 3.2...
    > Isn't that only the case with non-BMP code points?

    I don't know, but for now IE is known to support non-BMP characters only through
    NCRs, even in UTF-8 documents, AND only if the "x-user-defined" encoding is
    specified (which is a non standard alias of UTF-8 with special behavior to
    select a specific user-defined default font instead of using the per-script
    default font) or with manual selection of the "User-Defined" encoding in the
    Display menu, and only after patching some registry entries.

    However, NCRs are sometimes the only way to display non Latin characters in some
    browsers, as they rely only on the user's locale to get the language and its
    "prefered" encoding.

    As far as I know, Unicode is the only alternative to ISCII for Indian Brahmic
    Scripts, however Urdu written with the Arabic script may be supported with the
    Arabic ISO8859 charset.

    UTF-8 is very viable for now for all scripts that can be restricted to the BMP.
    UTF-8 support out of the BMP is often bous or inexistant in too many browsers...
    (I did not try, but IE may support the CESU-8 or UTF-16 encoding to get
    characters out of the BMP, because UTF-16 is the encoding used internally within
    strings handled with its JavaScript interface).

    One can test if JavaScript supports UTF-32 strings very simply, by making a
    character from a non-BMP codepoint value such as 0x10FFFD, and reading the code
    of the first "character" of that string.

    One can also test in JavaScript whever UTF-16 is supported by performing the
    same test with 0xD800 and 0xDC00.

    Depending on this result, you may then be able to display strings containing
    non-BMP characters, but still provided that the user manuall selects the
    "User-Defined" encoding in the IE's display menu or the page is sent with an exp
    licit header specifuing the charset: "Content-Type: text/html;
    charset=x-user-defined", but this cannot be performed through JavaScript...

    This archive was generated by hypermail 2.1.5 : Mon Mar 22 2004 - 14:35:42 EST