RE: Additions to code page 1252

From: Lori Brownell (loribr@microsoft.com)
Date: Thu Jul 10 1997 - 17:46:35 EDT


The Microsoft approach to growing a code page while never reassigning
existing code points is the same approach that Unicode takes. The
difference being that Unicode has written versions. While we can
certainly document the code pages with some sort of a verison number
that doesn't help most programs or data files that do not distinguish
between "versions" of encodings.

Ken, we should discuss this further at the next UTC meeting. If I'm not
there personally, please discuss it with Murray and the other person
from my group who will be attending.

Thanks - Lori

> -----Original Message-----
> From: kenw@sybase.com [SMTP:kenw@sybase.com]
> Sent: Thursday, July 10, 1997 10:23 AM
> To: unicode@unicode.org
> Cc: Lori Brownell; kenw@sybase.com
> Subject: Re: Additions to code page 1252
>
> >
> > We plan on adding the following 2 mappings to the Windows code page
> 1252
> > definition so that this encoding will be a superset of the new ISO
> 8859
> > encoding.
> >
> > 0x8e = U+017d Latin Capital Letter Z With Caron
> > 0x9e = U+017e Latin Small Letter Z With Caron
> >
> > Thanks, Lori Brownell (LoriBr@Microsoft.com)
> >
>
> First of all, thanks for posting this information to the Unicode
> list. It is helpful to everyone to get advance notice of additions
> like this. I presume this is in addition to the addition of 0x80 =
> U+20AC EURO SIGN.
>
> And I don't want to seem like wanting to shoot the messenger, but
> Lee Fryer-Davis and Tony Harminc have raised good points about this.
>
> Microsoft's approach to the Windows code pages seems to be that
> it is o.k. to grow them by gradual accretion of new characters
> filling in the gaps, without changing the code page identity or
> providing any visible (or even documented) versioning. While for
> many uses this is unobjectionable and provides a means of satisfying
> customer and/or vendor needs in an incremental way, for other
> purposes it causes major trouble.
>
> In particular, there has been a lot of discussion on this list just
> recently regarding the identification of CP1252 as a "charset" on
> the Internet. The way the MIME charset is defined, a change in
> repertoire for a character set implies a change in "charset", since
> it changes the way an octet stream is mapped into characters. The
> current situation on the Internet is that there are 2 CP1252's,
> and with these additions, now 3 CP1252's, all of which are chaotically
> and indeterminantly related to ISO-8859-1, assumed as default by
> many browsers on many platforms. The net result is interoperability
> problems for the 0x80..0x9F characters in CP1252, and many visible
> "bugs" in documents containing Windows 1252 characters.
>
> Given that Microsoft has a major impact as a "producer" of
> code pages, and is also a major player in web browsers and web
> authoring tools, is there any possibility that Microsoft could take
> some ownership of this character set identity and versioning problem
> and get involved with the IETF folks wrestling with the IANA
> "charset" registry that is referenced by Internet standards?
> A little proactivity in the area might save a lot of Microsoft
> bashing in the Internet arena.
>
> --Ken Whistler



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT