Re: Additions to code page 1252

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jul 10 1997 - 13:23:21 EDT


>
> We plan on adding the following 2 mappings to the Windows code page 1252
> definition so that this encoding will be a superset of the new ISO 8859
> encoding.
>
> 0x8e = U+017d Latin Capital Letter Z With Caron
> 0x9e = U+017e Latin Small Letter Z With Caron
>
> Thanks, Lori Brownell (LoriBr@Microsoft.com)
>

First of all, thanks for posting this information to the Unicode
list. It is helpful to everyone to get advance notice of additions
like this. I presume this is in addition to the addition of 0x80 =
U+20AC EURO SIGN.

And I don't want to seem like wanting to shoot the messenger, but
Lee Fryer-Davis and Tony Harminc have raised good points about this.

Microsoft's approach to the Windows code pages seems to be that
it is o.k. to grow them by gradual accretion of new characters
filling in the gaps, without changing the code page identity or
providing any visible (or even documented) versioning. While for
many uses this is unobjectionable and provides a means of satisfying
customer and/or vendor needs in an incremental way, for other
purposes it causes major trouble.

In particular, there has been a lot of discussion on this list just
recently regarding the identification of CP1252 as a "charset" on
the Internet. The way the MIME charset is defined, a change in
repertoire for a character set implies a change in "charset", since
it changes the way an octet stream is mapped into characters. The
current situation on the Internet is that there are 2 CP1252's,
and with these additions, now 3 CP1252's, all of which are chaotically
and indeterminantly related to ISO-8859-1, assumed as default by
many browsers on many platforms. The net result is interoperability
problems for the 0x80..0x9F characters in CP1252, and many visible
"bugs" in documents containing Windows 1252 characters.

Given that Microsoft has a major impact as a "producer" of
code pages, and is also a major player in web browsers and web
authoring tools, is there any possibility that Microsoft could take
some ownership of this character set identity and versioning problem
and get involved with the IETF folks wrestling with the IANA
"charset" registry that is referenced by Internet standards?
A little proactivity in the area might save a lot of Microsoft
bashing in the Internet arena.

--Ken Whistler



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT