Re: Usage of CP1252 characters on www.msnbc.com

From: Misha Wolf (misha.wolf@reuters.com)
Date: Tue Jul 08 1997 - 08:41:55 EDT


Chris Pratley wrote:

> At the start of the Internet phenomenon, NCRs were not defined to be
> Unicode (in fact to my knowledge, this is STILL not a standard
> officially, which is why it is an RFC). People had data that used these
> characters. There were no named entities for thing like smart quotes.
> They had to be round-tripped somehow. I don't think the issues are as
> black and white as you claim. In any case, I'm not interested in that
> discussion. I'm interested in addressing the issue I posted about.

I'm not sure what you mean by "this is STILL not a standard officially, which
is why it is an RFC". RFC 1866, "Hypertext Markup Language - 2.0", is a
"Standards Track" RFC:

<cite>

1.2.1. Documents

   A document is a conforming HTML document if:

        * It is a conforming SGML document, and it conforms to the
        HTML DTD (see 9.1, "HTML DTD").

            NOTE - There are a number of syntactic idioms that
            ...

        * It conforms to the application conventions in this
        specification. For example, the value of the HREF attribute
        of the <A> element must conform to the URI syntax.

        * Its document character set includes [ISO-8859-1] and
        agrees with [ISO-10646]; that is, each code position listed
        in 13, "The HTML Coded Character Set" is included, and each
        code position in the document character set is mapped to the
        same character as [ISO-10646] designates for that code
        position.

            NOTE - The document character set is somewhat
            independent of the character encoding scheme used to
            represent a document. For example, the `ISO-2022-JP'
            character encoding scheme can be used for HTML
            documents, since its repertoire is a subset of the
            [ISO-10646] repertoire. The critical distinction is
            that numeric character references agree with
            [ISO-10646] regardless of how the document is
            encoded.

</cite>

And, of course, we are all waiting for news of HTML N (where N > 3.2).

Misha

------------------------------------------------------------------------
Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of
Reuters Ltd.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT