Re: Usage of CP1252 characters on

From: Lars Henrik Mathiesen (
Date: Tue Jul 08 1997 - 08:41:10 EDT

   Reply-To: ("Markus G. Kuhn")
   From: "Unicode Discussion" <>
   Date: Mon, 7 Jul 1997 20:13:32 -0700 (PDT)

   The user interface that I would prefer is:
     1) Use Unicode numerical character references: ...
     2) Use Unicode UTF-8: ...
     3) Use only ISO Latin-1 characters: ...
     4) Use native Windows character set (CP1252): ...

What happened to the idea of using named character entities, as in Someone did mention them,
but no notice seemed to be taken...

This set claims to include the extra characters from CP1252 (except
EURO); it certainly has the quotes that started the discussion.

I think this representation would be suitable as the default export
format of Windows HTML editors. Unlike the Unicode representations, it
probably works in some older Windows browsers(*), and it will also
work on non-Windows browsers once they start supporting Unicode fonts.
The best of all worlds, and no need to present the user with an
unwanted choice.

Of course, this representation will cause non-Windows users to see
quotes like &ldquo;quoted stuff&rdquo;, instead of ?quoted stuff? or
just quoted stuff (without quotes); but hopefully that will cause them
to complain until their systems get upgraded, instead of suing hapless
journalists for misrepresentation.

(BTW, "sbquo" for `single low-9 quotation mark' seems odd: Why not
"bsquo" when the others are "lsquo", "rsquo", "ldquo", "rdquo",
"bdquo". Perhaps "bsquo" is taken for something else in ISO 8879?)

Lars Mathiesen (U of Copenhagen CS Dep) <> (Humour NOT marked)
(*) I regularly see &dagger; used in HTML converted from PDF (Cisco
online manuals, to be specific); Netscape (Navigator 3.01 Gold) on
UNIX just shows the string "&dagger;", but Windows browsers presumably
show the proper glyph --- if not, I imagine that Cisco would use
another converter.

Ns3.01G also shows a question mark for unknown numeric character
references; this loses information, but it is still better than the
Windows-specific codepoints which it totally ignores.

