Re: GB18030

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Sep 27 2001 - 17:47:15 EDT


Yung-Fong Tang wrote:
> ...
> > http://www-106.ibm.com/developerworks/library/u-china.html
> >
> > Markus Scherer's excellent documentation of GB 18030, with
> > code snippets and pointer to a complete ICU implementation.
>
> That paper itself does not specify any details mapping table.

True, but it explains that they are treated algorithmically, and how to do that.

> I look at
> http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml .
>
> It is interesting that the mapping between U+10000 and U+10FFFF is check
> in only 5 weeks ago in the version 1.3

We had this same, correct mapping table up elsewhere on our server since February, I believe.

When we imported the .xml mapping tables into our newish charset cvs repository, we accidentally ran the tool that generates .xml from our internal format on this one as well. That does not work since the internal 18030 file is missing all algorithmic parts (we don't have an equivalent of the <range> element). This is the one file that we cannot fully generate from our internal table...

I sent an email to this list 5 weeks ago pointing out this mistake. Sorry for the confusion.

> ...
> looks like I beat ICU by checkin my mapping table at April 9 (to
> mozilla) , 10 days before they check in their first version of GB18030
> xml mapping table :)

I am sorry to disappoint you. ICU 1.7, released in December 2000, had the GB 18030 converter. I implemented it in October, and updated it with the new mapping table from 2000-nov-30 on that same day. That all includes support for the supplementary planes!
:-)

> I probably can still claim the first open source
> project which support GB18030 to Unicode conversion, althought I didn't
> do anything beyond BMP ....

Nope ;-)

markus



This archive was generated by hypermail 2.1.2 : Thu Sep 27 2001 - 16:33:57 EDT