Re: Usage of CP1252 characters on www.msnbc.com

From: Lars Henrik Mathiesen (thorinn@diku.dk)
Date: Tue Jul 08 1997 - 17:09:41 EDT


   Reply-To: kuhn@cs.purdue.edu ("Markus G. Kuhn")
   From: "Unicode Discussion" <unicode@unicode.org>
   Date: Tue, 8 Jul 1997 12:21:13 -0700 (PDT)

   Lars Henrik Mathiesen wrote on 1997-07-08 12:41 UTC:
> The user interface that I would prefer is:
> ...
> 1) Use Unicode numerical character references: ...
> 2) Use Unicode UTF-8: ...
> 3) Use only ISO Latin-1 characters: ...
> 4) Use native Windows character set (CP1252): ...
>
> What happened to the idea of using named character entities, as in
> <http://www.w3.org/pub/WWW/TR/WD-entities>? Someone did mention them,
> but no notice seemed to be taken...

   For good reason. They

     - are even less widely supported
     - need just another table in the implementation

The table is there already, for &gt; and friends (and the upper part
of Latin-1).

     - do not scale beyond CP1252
     - do not even support MES or WGL4
     - are just yet another small and arbitrary chosen subset of
       Unicode that will contribute to the uncontrolable
       Unicode subset inflation

   and therefore do not look that attractive at all to me.

The problem at hand is exactly that a lot of people use CP1252 to
create documents, and want to export them to HTML. The subset is
already defined by usage. MES and WGL4 are straw men here; your own
solutions 3 and 4 are clearly insufficient to handle them.

If a user, today, wants to export a CP1252 document, emitting named
entities is as good or better than any of your methods. (Provided that
my assumption about current Windows browsers grokking &ldquo; et al.
is true).

If a user, tomorrow, wants to export an MES document, emitting Unicode
is the only viable solution. Asking the user to choose between NCRs
and UTF-8 is probably pointless.

Lars Mathiesen (U of Copenhagen CS Dep) <thorinn@diku.dk> (Humour NOT marked)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT