Re: Novice question

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Mar 22 2004 - 13:53:16 EST

Next message: Peter Kirk: "Re: Novice question"

Previous message: Stefan Persson: "Re: Novice question"
In reply to: Stefan Persson: "Re: Novice question"
Next in thread: Peter Kirk: "Re: Novice question"
Reply: Peter Kirk: "Re: Novice question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Stefan Persson" <alsjebegrijptwatikbedoel@yahoo.se>
> Philippe Verdy wrote:
> > Some browsers will need NCRs, some will accept UTF-8, some will need a
> > "x-user-defined" encoding which is not a standard encoding for use in
conforming
> > HTML 3.2...
>
> Isn't that only the case with non-BMP code points?

I don't know, but for now IE is known to support non-BMP characters only through
NCRs, even in UTF-8 documents, AND only if the "x-user-defined" encoding is
specified (which is a non standard alias of UTF-8 with special behavior to
select a specific user-defined default font instead of using the per-script
default font) or with manual selection of the "User-Defined" encoding in the
Display menu, and only after patching some registry entries.

However, NCRs are sometimes the only way to display non Latin characters in some
browsers, as they rely only on the user's locale to get the language and its
"prefered" encoding.

As far as I know, Unicode is the only alternative to ISCII for Indian Brahmic
Scripts, however Urdu written with the Arabic script may be supported with the
Arabic ISO8859 charset.

UTF-8 is very viable for now for all scripts that can be restricted to the BMP.
UTF-8 support out of the BMP is often bous or inexistant in too many browsers...
(I did not try, but IE may support the CESU-8 or UTF-16 encoding to get
characters out of the BMP, because UTF-16 is the encoding used internally within
strings handled with its JavaScript interface).

One can test if JavaScript supports UTF-32 strings very simply, by making a
character from a non-BMP codepoint value such as 0x10FFFD, and reading the code
of the first "character" of that string.

One can also test in JavaScript whever UTF-16 is supported by performing the
same test with 0xD800 and 0xDC00.

Depending on this result, you may then be able to display strings containing
non-BMP characters, but still provided that the user manuall selects the
"User-Defined" encoding in the IE's display menu or the page is sent with an exp
licit header specifuing the charset: "Content-Type: text/html;
charset=x-user-defined", but this cannot be performed through JavaScript...

Next message: Peter Kirk: "Re: Novice question"
Previous message: Stefan Persson: "Re: Novice question"
In reply to: Stefan Persson: "Re: Novice question"
Next in thread: Peter Kirk: "Re: Novice question"
Reply: Peter Kirk: "Re: Novice question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Mar 22 2004 - 14:35:42 EST