Two new characters added to KS X 1001 in Dec. 1998

From: Jungshik Shin (jshin@mailaps.org)
Date: Mon Apr 22 2002 - 19:51:11 EDT

Previous message: David Starner: "Re: browsers and unicode surrogates"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Note to subscribers of Unicode and Linux-utf8 mailing list:

The following message is about two new characters added to South
Korean (ROK) nat'l coded character set standard KS X 1001 in December,
1998. Although this change is not directly related to two lists I'm
copying this to, I'm taking that step hoping that this 'news' will reach
as many engineers in charge of supporting Korean at as many companies as
possible. According to Bruno Haible (the maintainer of libiconv), this
change doesn't seem to have been reflected on/in a number of platforms
and products.

It'd be nice if you could direct your question/reply to me as opposed
to two lists.

Thank you,

On Sat, 20 Apr 2002, Bruno Haible wrote:

Hi Bruno,

Thank you for your reply.

> Jungshik Shin writes:
>
> > I've just found that xc/lib/X11/lcUniConv/ksc5601.h had not been
> > updated to reflect the change made in the standard at the end of 1998.
> > Could you update it in both XF86 and libiconv? I thought you had already
> > done that in libiconv because two characters had been added to glibc
> > 2.2.x. They are:
> >
> > U+20AC at row 2, column 70 (0x2266)
> > U+00AE at row 2, column 71 (0x2267)
> >
> > KSX1001.TXT.gz and JOHAB.TXT.gz at http://jshin.net/faq/ have been
> > updated to reflect this change.

> Thanks for telling me about problems in ksc5601.h, contained in both
> libX11 and GNU libiconv.
>
> Can you give a little more evidence/details about the standard change that
> you mention? The charmaps of EUC-KR on AIX, Solaris, Java don't
> contain the change you mention
> (see http://oss.software.ibm.com/cvs/icu/charset/data/ucm/),

Ooops. Then, I have to report the change to Sun ( Solaris and
JDK) and IBM (ICU and AIX). Well, as a short-cut, I'm copying this
message to Unicode mailing list hoping that engineers from Sun, IBM,
Oracle, Sybase, Apple and so forth will take a note.. If necessary, I'll
contact them separately (A week or so ago, I wrote to a Sun engineer, but
hasn't heard back yet.) Anyway, Perl::Encode already has them in both
EUC-KR, JOHAB, CP949 and ISO-2022-KR. I filed a Mozilla bug (134749)
and made a patch for this and Ken Lunde @Adobe was notified of the
change so that CMap files for Adobe Korean fonts will have them, soon.
On top of Solaris, JDK, ICU and AIX, various DBs(commerical or not),
MacOS(apparently, Apple hasn't updated their Korean mapping), Python,
and Tcl have to update their mapping tables as well.

> and I can find no trace of such a change on various websites. All info
> about these 2 character additions appears to originate from you.
> Unfortunately I have learned that in this table patchwork business I
> have to rely on several independent sources.

The story goes like this. Sometime last fall, PARK Won-kyu
<wonkyu@chem.skku.ac.kr> noticed that EUC-KR charmap in Glibc 2.2.x
has two additional characters not found in my copy of KSX1001.TXT
(at http://jshin.net/faq/KSX1001.TXT.gz). He asked me about them
on hangul-patch@kldp.org mailing list. I forwarded his message
to Prof. GIM Geongseog (KIM Keyongseok) at Pusan Nat'l Univ.
(<gimgs@asadal.cs.pusan.ac.kr>) who represents South Korea(ROK)
in ISO/IEC JTC1 SC2/WG2 and SC22/WG20. (you can find several of his
responses to North Korean requests for shuffling the codepoints of Korean
Hangul syllables in ISO/IEC 10646 and Unicode per DPRK dictionary sorting
order and adding conjoining Jamos to U+1100 block in JTC1 WG2/SC2 web page
on behalf of ROK). He replied to me that indeed two characters were added
to KS X 1001 in December 1998. He also mentioned that one more character
(Korean zip code sign) would be added sometime this year. I can assure
you that he is definitely an authorative source on KS X series standard.

Another piece of 'evidence'(?) is that Windows-949 mapping table
maintained by Microsoft (available at
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT).
It seems to have undergone a few revisions and the newest version include
two characters at the corresponding EUC-KR positions. As you know well,
CP949(Windows-949, Unified Hangul Code) is upward compatible with EUC-KR
and has 8,822 additional Hangul syllables outside the EUC range (1st
and 2nd byte 0xA1-0xFE). Therefore, subtracting 8,822 additional Hangul
syllables from CP949<->Unicode mapping, we should get EUC-KR <-> Unicode
mapping (I'm aware that the result is different from EUC-KR portion of
MacKorean<->Unicode mapping which uses 'half-width' characters whenever
possible.) Sure, Microsoft could have added other characters to some
unused slots in the EUC-KR range as Apple did in MacOS Korean. However,
if we trust Prof. Gim (,which I'm sure we can), that's not the case.

To be 100% sure, I'd love to have a PDF version of KS X 1001:1998 with
these two characters added. Unfortunately, <http://standard.ksa.or.kr>
doesn't sell KS X 1001 in PDF. (I have a hard copy of KS C 5601-1992/
KS X 1001:1997.). Probably, I have to ask my friend in Seoul to buy a
paper version of KS X 1001 and send it to me.

Now it becomes interesting. Who added two characters to EUC-KR
charmap in Glibc 2.2.x? I thought you had done that and I was 'ashamed
of' not having noticed the change you had already found about. ;-)
Apparently, you didn't. Then, Ulrich must have done it, right?
I can't think of anyone else....

Anyway, I hope I presented enough evidence to convince you.

Regards,

Jungshik Shin

Previous message: David Starner: "Re: browsers and unicode surrogates"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Mon Apr 22 2002 - 20:36:14 EDT