unicode@Unicode.ORG writes:
> >According to the HTML I18N spec, all that is needed in this case is to
> >specify
> >CHARSET=CP1251, and the text would be correctly converted to the equivalent
> >Unicodes.
>
> The issue is not the coded content of the document, about which you are
> correct. The issue is numeric character references of the form &nnnn.
> Some HTML documents today use numeric references in the C1 range,
> assuming they are the extra characters in cp1251. This is contrary to the
> i18n spec, which states that all numeric character references refer to
> Unicode. This means that all references in the C1 range are illegal
> according to the spec.
A sublety: the i18n spec refers to UCS, which has a consquence
when going beyond BMP. There UCS has well defined numbers, while I
do not know whether Unicode has this.
Keld
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:32 EDT