RE: HTML - i18n / NCR & charsets

From: Peter Amstein (peteram@microsoft.com)
Date: Thu Nov 28 1996 - 01:01:49 EST

Next message: Geoffrey Waigh: "Thai encoding standards"
Previous message: Martin J. Duerst: "Re: HTML - i18n / NCR & charsets"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

FrontPage 97 does (a), (b) and (c) correctly. While I'm not sure I
approve
of the term "cheesy", it is true that FrontPage 1.0 and
FrontPage 1.1 write named character references for all characters
greater than 127. As pointed out below, those programs were only
designed for use with code page 1252 or ISO Latin 1. In retrospect
I wish we had not done that, but we did.

FrontPage 97 does not automatically "fix" the named character references
in a page it reads because that would be incompatible with item (c),
which is to always interpret them as the actual character named.

Microsoft does provide fix-up programs that will convert all of the
named references in an HTML file to the corresponding octet,
thereby allowing you fix a page before opening it with FrontPage 97.

-Peter Amstein
>
>-----Original Message-----
>From: unicode@Unicode.ORG [SMTP:unicode@Unicode.ORG]
>Sent: Wednesday, November 27, 1996 12:30 PM
>To: unicode@Unicode.ORG
>Subject: Re: HTML - i18n / NCR & charsets
>
>We have three representations:
>(a) raw octets
>(b) numeric character references
>(c) entity names.
>
>Numeric character references are, of course, supposed to refer to Unicode/
>ISO 10646.
>
>The charset, whether specified via HTTP or HTML or a menu, should affect
>the interpretation of (a). It should *not* affect the interpretation of
>(b) or (c). The major browsers were broken in this regard and are being
>gradually fixed.
>
>An example of a "cheesy little editor" that created lots of polluted Web
>pages was FrontPage 1.0. Though Microsoft sold it as suitable only for
>Code Page 1252, lots of people used it on other Code Pages. FP 1.0 simply
>exports stuff as if it were CP 1252, hence a Russian Web page ends up full
>of Latin 1 entity names! FP 2.0 (aka 97) has, I believe, fixed this.
>
>The various Internet Assistants did the same foul thing. I hope they've
>been fixed.
>
>The pages created using these tools will presumably (?) get fixed when
>their authors pass them through the new versions of the tools. Can anyone
>confirm/deny this?
>
>Misha
>

Next message: Geoffrey Waigh: "Thai encoding standards"
Previous message: Martin J. Duerst: "Re: HTML - i18n / NCR & charsets"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT