Re: Microsoft Code Page Tables

From: Mark Davis (markdavis@ispchannel.com)
Date: Thu Apr 06 2000 - 21:45:03 EDT


BTW, the pages on http://www.microsoft.com/globaldev don't actually reflect reality (i.e. what you get if you call the Windows APIs).

For example, http://www.microsoft.com/globaldev/reference/sbcs/1251.htm has 0x98 as undefined. In reality, it maps to 0098, as you see on http://oss.software.ibm.com/icu/charset/CharMaps-HTML/windows-1251-NT4.0.5.html

Similarly, http://www.microsoft.com/globaldev/reference/dbcs/932.htm says that 0xFF is unassigned. In reality, it maps to F8F3, as on http://oss.software.ibm.com/icu/charset/CharMaps-HTML/windows-932-NT4.0.5.html

[These tables were generated by programmatically calling the Windows APIs on an NT, version 4.0, service pack 5.]

This is not to single out Microsoft -- most vendors' documented mapping tables deviate at least slightly from what their APIs actually do. One of the goals of the character mapping project is to determine precisely what the character mappings are so that we can guarantee accurate transcoding between Unicode and legacy sets. This is crucial in a connected world -- if I generate an XML document or HTML page on one machine, I need to be *very* sure that the mapping is precisely what the receiving machine will interpret.

Mark

Lori Brownell wrote:

> Try http://www.microsoft.com/globaldev
>
> -----Original Message-----
> From: Tom Emerson [mailto:Tree@basistech.com]
> Sent: Thursday, April 06, 2000 11:28 AM
> To: Unicode List
> Subject: Microsoft Code Page Tables
>
> I've been unable to find the tables showing the characters in each of
> Microsoft's Code Pages. Specifically I would like to see the table for CP874
> (which is supposed to be the same as TIS 620).
>
> TIA,
>
> -tree
>
> --
> Tom Emerson Basis Technology Corp.
> Language Hacker http://www.basistech.com
> "Beware the lollipop of mediocrity: lick it once and you suck forever"



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT