Re: HTML - i18n / NCR & charsets

From: Keld J|rn Simonsen (keld@dkuug.dk)
Date: Wed Nov 27 1996 - 09:23:41 EST


unicode@Unicode.ORG writes:

> >According to the HTML I18N spec, all that is needed in this case is to
> >specify
> >CHARSET=CP1251, and the text would be correctly converted to the equivalent
> >Unicodes.
>
> The issue is not the coded content of the document, about which you are
> correct. The issue is numeric character references of the form &nnnn.
> Some HTML documents today use numeric references in the C1 range,
> assuming they are the extra characters in cp1251. This is contrary to the
> i18n spec, which states that all numeric character references refer to
> Unicode. This means that all references in the C1 range are illegal
> according to the spec.

A sublety: the i18n spec refers to UCS, which has a consquence
when going beyond BMP. There UCS has well defined numbers, while I
do not know whether Unicode has this.

Keld



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:32 EDT