Re: Ideographic Description

From: John Cowan (cowan@locke.ccil.org)
Date: Wed Sep 08 1999 - 12:12:59 EDT


Marco.Cimarosti@icl.com scripsit:

> I understand that IDSs are a way of providing a description for an
> "ideograph" that is not (yet) encoded within the CJK Unified Ideographs.
> In other terms they permit to do a descriptive "spelling" for an ideograph,
> just like you would do with an alphabetic word (e.g. "Marco" -> "capital em,
> a, er, cee, o").

Just so.

> 1) What is IDS really for? Why has this feature been introduced in
> ISO-10646?

As you say, so that ideographs that cannot be coded directly can at least
be described. There is no hope of coding every existing ideograph,
because there is no authoritative list of all ideographs that have
ever been used anywhere (new documents are periodically dug up, literally,
that are written on tortoise shell or bone), and new ideographs are
constantly, though slowly, being created.

> 2) Will these addition be integrated in Unicode as well?

Yes. Unicode 3.0 will contain them.

> 3) Document [1] explicitly states that an IDS "describes the ideograph
> in the abstract form. It is not interpreted as a composed character and does
> not have any rendering implication." -- OK: pretty rendering of IDSs is not
> *required* to conformant applications, but is it *forbidded*?

I don't see how it could be *forbidden*.

> 4) Would it be conformant to use an IDS in place of a character already
> encoded within CJK Unified Ideographs?

As long as you understand that you have something different, not
equivalent in any sense (ordinary Unicode processes will not recognize
the identity).

> 5) What if one only uses Description Components (DC) form the new
> "Kangxi Radicals" and "CJK Radicals Supplement": would it be possible to
> build valid IDSs for *all* the encoded CJK Unified Ideographs using only
> these elements?

Probably not. The traditional analysis uses 214 radicals (the KangXi set)
and about 1000 phonetics.

> 6) Some of the Kangxi radicals (especially those with stroke number >=
> 10) could be expressed with an IDS, using simpler components. Would it be
> considered conformant to make an IDS that "decomposes" a Kangxi radical?

Conformant to what? IDSes are compact *descriptions*, the equivalent of
writing "(Insert an ideograph here that looks like a *foo* above a *bar*)".

> 7) Will ISO/IEC ever publish a list of IDSs for existing CJK Unified
> Ideographs? (I.e. a sort of decomposition mapping file)?

I sort of doubt it, but there is nothing stopping *you* from doing so.

> As you may have guessed by my inappropriate terminology, I have absolutely
> no liason with any standard body or committee (oh, well, have been a private
> member of Unicode, but just for one year), and I discovered these documents
> only by chance. However, the possibility of implementing an IDS renderer
> sounds very appealing to me, because it reminds me of an idea - that I have
> been cherishing for a long time - of an 8-bit character set for CJK
> characters, that only encodes the smallest possible set of basic
> "components".

In that case, the stroke level probably makes more sense than the component
level. Stroke writing order is standardized, and probably 40-50 strokes
would do it all.

-- 
John Cowan                                   cowan@ccil.org
       I am a member of a civilization. --David Brin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT