Re: IDS question

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Apr 30 2001 - 14:33:38 EDT


Thomas Chan asked:

> I've recently been using Ideographic Description Sequences to describe
> some Han characters that are not in Unicode 3.1, and I noticed that
> U+3007 is not included in the set of "UnifiedIdeographs", despite having
> the "ideographic" property (TUS3.0, p. 269; UAX #27, section 10.1). I
> understand that compatibility ideographs are not allowed to participate
> in IDS, but U+3007 doesn't have a clone, as far as I know.

It was never considered to be part of the set of characters being
dealt with by the IRG for unification, as far as I know. Instead, it
was just treated as one more of the symbols that was mapped out of
the various East Asian character standards. There apparently never was
any unification issue for it, since no one would have encoded it twice
in a legacy character set, and there are no traditional variations in
its shape.

>
> There are some characters in LENG Yulong and WEI Yixin's _Zhonghua Zihai_
> dictionary (Beijing: Zhonghua, 1994), such as gu2 on p. 31 and lin2 on p.
> 32 that incorporate a circular component. I'd probably describe them as:
>
> gu2, p. 31:
> U+2FFB IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID
> U+5341 (shi 'ten')
> U+3007 (ling 'zero')
>

>
> However, those aren't valid sequences. I realize the above two characters
> are rather odd, but the likes of U+3AB3 and U+3AC8 would have faced the
> same problem, since they also incorporate a circular component.

There are other characters that might be difficult to describe using
IDS. Many of the oddballs in Extension B could fall into this category.
Just on the first chart, 20008, 20067, 20069, 20073, and so on might be
hard to describe in terms of the IDC's, because of their odd pieces.
Also, the use of IDC's was originally envisioned to include also a
large number of "components", to complement the already encoded
radicals, so that the common pattern of sticking a radical onto a
component could be simply described in those instances where the
component itself does not constitute a stand-alone character. Don't
be surprised if China yet decides to submit hundreds of components for
encoding, just to cover this kind of situation.

However, I don't think the IDC's were intended to be a complete,
closed mechanism for describing any ideograph ever encountered, no
matter how bizarre (such as those for ideographs that just happened
to be miscarved on a wood block at some point in history).

>
> What would be the advisable way to handle these cases, besides
> creating invalid IDS sequences, using the PUA, or giving a prose
> description?

My suggestion would be that you just give prose descriptions, and
check in with the IRG that these are included in their sources for
work on Vertical Extension C.

For this particular instance, I suppose you could also apply to
the UTC with a proposal to add U+3007 to the IDS syntax, to make
these two descriptions "legal". I'm not sure it is worth the effort,
however.

--Ken



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT