Re: New BMP characters (was Re: [very OT] Documentation: beyond

From: Thomas Chan (
Date: Wed Feb 21 2001 - 18:43:24 EST

On Wed, 21 Feb 2001, Jungshik Shin wrote:

> On Wed, 21 Feb 2001, Thomas Chan wrote:
> > The unihan.txt file ver 3.0b1 (1999.7.2) lists four K- sources as:
> > K0 KS C 5601-1987
> > K1 KS C 5657-1991
> > K2 PKS C 5700-1 1994
> > K3 PKS C 5700-2 1994
> > It's very clear what K0 and K1 are, and they are given as GR ranges
> > arranged by pronunciation, and it is okay that these ranges overlap, since
> > K0 and K1 are two different character sets.
> Hmm, it's not a big deal but I wonder why they're given as GR ranges
> instead of just row-column values (or GL). Somebody must have mixed
> up .......

Sorry, this was my mistake. K0 and K1 are given as GL.

> > K2 has what appears to be GL ranges given for it (0x2121 .. 0x7530), and
> > arranged by radical+strokes. K3 looks similar, having what appear to be
> > GL ranges (0x2121 .. 0x3771), arranged by radical+strokes, but they all
> > fall within CJK Extension A. The ranges given for K2 and K3 also overlap.
> > (They seem reminiscent of the "planes" of CNS 11643 / EUC-TW .)
> By K2 and K3 overlapping, you do not mean some characters in Ext. B are
> given references to both K2 and K3, do you? If not, it's natural and all
> right by the same token you said about the overlap of K0 and K1 ranges
> because it indicates that K2 and K3 have repertoirs disjoint from each
> other (i.e. The intersection of K2 and K3 is a null set) just like K0
> and K1 do.

No, I don't mean that some characters are given references to both K2 and
K3, which is impossible in the format the unihan.txt file is in. (That
doesn't mean it can't happen, though--e.g., a character can be in both GB
2312 and GB 12345, but only a reference to the former, the G0 source, is

Thomas Chan

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT