Re: HTML - i18n / NCR & charsets

From: Alain LaBont/e'/ (
Date: Wed Nov 27 1996 - 00:09:21 EST

At 18:02 26/11/96 -0800, Keld wrote:
>Misha Wolf writes:
>> If we are considering Web pages using Windows Code Pages, in which
>> illegal numeric character references have been used for characters
>> in the range 80-9F (decimal 128-159) then there will be no clash
>> with anything in Unicode as these values do not represent characters
>> in Unicode or, for that matter, in ISO 8859-X. A permissive browser
>> will simply map these to the expected characters.

[Keld] :
>I just checked, the AMD 3 to 10646 says that C1 is reserved
>for control characters, and thus it cannot be used for graphic
>characters like in CP1251

This is my understanding too. I agree that it is implicitly covered in the
UCS (in other words this space is implicitly included in the UCS as meaning
control characters), and I would assume, in UNICODE also. This is a big
concern for French as the way Windows extended Latin 1 8-bit table codes
<oe> OE> and <Y:> creates a huge difficulty calling for a new 8-bit code
deprecxating Latin 1 (compatible with it, except for unused characters such
as stand-alone accents, useless) to avoid the disappearance of those
characters. Then we could have straightforward conversions as long as 8-bit
code will coinhabit with the UCS. Expect such a project in ISO, beginning
with a registration.

Alain LaBonti

