Re: HTML - i18n / NCR & charsets

From: Misha Wolf (
Date: Tue Nov 26 1996 - 21:35:33 EST

If we are considering Web pages using Windows Code Pages, in which
illegal numeric character references have been used for characters
in the range 80-9F (decimal 128-159) then there will be no clash
with anything in Unicode as these values do not represent characters
in Unicode or, for that matter, in ISO 8859-X. A permissive browser
will simply map these to the expected characters.



On Tue, 26 Nov 1996, Misha Wolf wrote:

> The following extract from RFC 1866, "Hypertext Markup Language - 2.0" shows > that legal numeric character references have been based on Unicode for quite > some time and certainly prior to the I18N draft. > I quite agree here, and I do acknowledge this; but I do insist on current practice beeing the problem. Doing a quick scan over all reachable pages linked in from the webdirectory ( last night; I do find a substancial number of pages which would be broken. About 7%/4K pages. OF these about a fifth dates of before RFC1866.

But *AGAIN* I acknowledge that there _should_ be no problems, people should not have relied on NCRs in the low top bit range; but they have done so. And if you have easy ways of marking your pages such that you do not break excising practice, you should do so.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:32 EDT