Re: Usage of CP1252 characters on www.msnbc.com

From: Markus G. Kuhn (kuhn@cs.purdue.edu)
Date: Tue Jul 08 1997 - 15:23:03 EDT

Next message: Markus G. Kuhn: "Re: Usage of CP1252 characters on www.msnbc.com"
Previous message: Kenneth Whistler: "EBCDIC"
In reply to: Lars Henrik Mathiesen: "Re: Usage of CP1252 characters on www.msnbc.com"
Next in thread: Gavin Nicol: "RE: Usage of CP1252 characters on www.msnbc.com"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Lars Henrik Mathiesen wrote on 1997-07-08 12:41 UTC:
> The user interface that I would prefer is:
> ...
> 1) Use Unicode numerical character references: ...
> 2) Use Unicode UTF-8: ...
> 3) Use only ISO Latin-1 characters: ...
> 4) Use native Windows character set (CP1252): ...
>
> What happened to the idea of using named character entities, as in
> <http://www.w3.org/pub/WWW/TR/WD-entities>? Someone did mention them,
> but no notice seemed to be taken...

For good reason. They

  - are even less widely supported
  - need just another table in the implementation
  - do not scale beyond CP1252
  - do not even support MES or WGL4
  - are just yet another small and arbitrary chosen subset of
    Unicode that will contribute to the uncontrolable
    Unicode subset inflation

and therefore do not look that attractive at all to me.

The named character entities - like NCRs - are just a mechanism that allows
you to stay in the ASCII world. In my opinion the ultimate solution will
be UTF-8 or UCS-2, because then ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA
ABOVE WITH ALEF MAKSURA ISOLATED FORM will be as much a normal base
character set member as LATIN CAPITAL CHARACTER A. I just do not yet
recommend UTF-8 right now, because editors for it are not yet really wide
spread (except under Plan9).

NCRs are just a simpler and somewhat less convenient step to the ultimate
solution, and providing too convenient intermediate but incomplete
solutions is more of a danger in the long term in the sense that it
will delay the ultimate solution and will just create mechanisms that
have to be supported ad infinitum for backwards compatibility.

Markus

-- 
Markus G. Kuhn, Computer Science grad student, Purdue
University, Indiana, USA -- email: kuhn@cs.purdue.edu

Next message: Markus G. Kuhn: "Re: Usage of CP1252 characters on www.msnbc.com"
Previous message: Kenneth Whistler: "EBCDIC"
In reply to: Lars Henrik Mathiesen: "Re: Usage of CP1252 characters on www.msnbc.com"
Next in thread: Gavin Nicol: "RE: Usage of CP1252 characters on www.msnbc.com"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT