Re: Microsoft input method, 950, and Unicode mapping

From: Tex Texin (texin@progress.com)
Date: Tue Dec 18 2001 - 23:13:19 EST


Ken,

Thanks for commiserating.
Yes, I noticed the differences in mapping tables.
I am glad Sybase gave different character sets different names.
I am curious how you deal with Unicode and HKSCS in the private use
area, sometimes....
For that matter I wonder what a user in HK does when their Windows
operating system is upgraded and their files that had HKSCS characters
in the private use area now expect them in other locations.

With respect to messy tables, and HKSCS and GB18030 in particular, it is
a damn shame that there is no entity making a case to governments and
others creating character set standards, that they not consider the set
defined until it is registered to ISO and Unicode, so some of the silly
mistakes get worked out first. A little press relations here, with
recent history and resulting problems as evidence and the corrections
that came about once registration was attempted, would show that working
these things out in committee is helpful and not a threat to national
soverignty.

Oh well. Surely this won't happen again in 2002....
tex

Kenneth Whistler wrote:
>
> Tex,
>
> >
> > Thanks for this and the several private responses.
> >
> > For anyone interested, in addition to the Microsoft page:
> > http://www.microsoft.com/hk/hkscs/
> >
> > The HK Gov't has a web page, fonts and mapping tables:
> > http://www.info.gov.hk/digital21/eng/hkscs/introduction.html
>
> And to add to the chaos and confusion, note that the HKSCS
> patch for Windows Code Page 950 does not map exactly the
> same as the HK Government mapping table. And that the HK
> Government mapping table has at least a couple of blatant
> errors in it. And that the HKSCS path for Windows Code Page 950
> (like Code Page 950 without the extension, but even moreso)
> has duplicate mappings in it that need to be resolved in
> order to roundtrip through Unicode. And you have no guarantee
> that various vendors' attempts to sort out the HK Government
> mapping table and Windows Code Page 950 + HKSCS path behavior
> will themselves produce matching results.
>
> >
> > Oracle gave a nice paper at a recent Unicode conference:
> > http://www.unicode.org/iuc/iuc18/papers/b19.ppt
> >
> > It amazes me that in the year 2000, organizations are still creating
> > chaos by amending definitions of standards especially code pages,
> > without giving the new creation its own name or some other way of
> > distinguishing it, and then on top of that creating multiple mapping
> > tables.
> >
> > I understand the desire to get new functionality into users hands, but
> > would it have been a problem to rename either big5 or 950 to something
> > like big-6 or big-5hk or 950HK or 951?
>
> Sybase is now supporting "cp950" (+euro, by the way -- another addition
> that may or may not be supported in a particular Windows implementation,
> depending on date) and a separate "big5hk", so if you interoperate
> with Sybase, you should know what you are getting. However, like
> everybody else, it is hit or miss for us when a platform or other
> data announces itself to us as "cp950" or "big-5", whether it
> is with or without the HKSCS extensions.
>
> > So now we can't tell if big-5 or 950 will or won't have this data, or
> > even whether Unicode data will have these characters in the private use
> > area or elsewhere, or whether software that may be on the other end of
> > the pipe supports HKSCS or not, or even if their operating system has
> > the patch or not.
> >
> > Although "that which we call a rose by any other name would smell as
> > sweet",
> > calling everything a rose, makes it hard to know when you are getting a
> > rose.
>
> I think this was all part of a conspiracy for Chinese to catch up
> with Japanese, since the Chinese code pages (until now) didn't have
> a mess the scale of SJIS. But between HKSCS and GB 18030, they are
> making up for lost time.
>
> --Ken
>
> >
> > Here's hoping for less chaos in 2002!
> > tex

-- 
-------------------------------------------------------------
Tex Texin                    Director, International Business
mailto:Texin@Progress.com    Tel: +1-781-280-4271
the Progress Company         Fax: +1-781-280-4655
-------------------------------------------------------------
For a compelling demonstration for Unicode:
http://www.geocities.com/i18nguy/unicode-example.html



This archive was generated by hypermail 2.1.2 : Tue Dec 18 2001 - 22:46:43 EST