8859-1, 8859-15, 1252 and Euro

From: Tim Greenwood (greenwood@openmarket.com)
Date: Mon Feb 07 2000 - 17:58:11 EST

Pretty much all of the pages on the web, and the browsers, ignore the
differences between ISO-8859-1 and Windows code page 1252. I cannot even
find a MIME name for CP1252 -
http://www.isi.edu/in-notes/iana/assignments/character-sets has 1250, 1251,
1253 upwards, but no 1252.

Internet Explorer V5 does differentiate between Western European (Windows)
and Western European(ISO), selecting the former as default and the latter
when 8859-1 is explicitly called for. However it displays a page with hex
code 80 displays as the Euro sign in both even though Euro is not a
character in 8859-1.

So what is a system that stores all data in Unicode and converts for web
output to do with U+20AC? The formally correct process would seem to be to
convert to 0x80 only for CP1252 (and the other CP12xx sets) to 0xa4 for
ISO-8859-15 and to the 'not a character in this set' sign for ISO-8859-1.
This may be formally correct, but would not help the majority of users. For
that we would convert to 0x80 for ISO-8859-1 - it works even though 'wrong'.

Have others faced this? (Perhaps you just avoid the problems with the
numeric or named character reference)


