Re: Ideographic Description

From: John Jenkins (jenkins@apple.com)
Date: Wed Sep 08 1999 - 11:47:07 EDT


You may want to check my talk at last week's International Unicode
Conference, which discusses these in some detail. The URL is:

http://fonts.apple.com/WhitePapers/IUC15Han.pdf

>
> 1) What is IDS really for? Why has this feature been introduced in
> ISO-10646?
>

They are intended to provide a means of including unencoded ideographs in
text. The presumption is that there will never be a time when *all*
possible ideographs in actual use (present or past) will be formally
encoded, and some mechanism is required to handle the missing ones.

> 2) Will these addition be integrated in Unicode as well?
>

Yes. They are a part of Unicode 3.0.

> 3) Document [1] explicitly states that an IDS "describes the ideograph
> in the abstract form. It is not interpreted as a composed character and does
> not have any rendering implication." -- OK: pretty rendering of IDSs is not
> *required* to conformant applications, but is it *forbidded*?
>

Unicode requires that IDCs must have some visual appearance. Applications
may choose to parse the IDS and render appropriately, but it isn't
recommended.

> 4) Would it be conformant to use an IDS in place of a character already
> encoded within CJK Unified Ideographs?
>

No. Unicode adds to 10646 the formal requirement that an IDS be as short as
possible, which would mean that using an IDS to describe an already encoded
ideograph is non-conformant. Even more explicitly, Unicode says that this
is forbidden.

> 5) What if one only uses Description Components (DC) form the new
> "Kangxi Radicals" and "CJK Radicals Supplement": would it be possible to
> build valid IDSs for *all* the encoded CJK Unified Ideographs using only
> these elements?
>

No. Not by a long shot. Most of the common phonetic elements of ideographs
don't occur in either radical block.

It should be pointed out that Unicode considers radicals and ideographs
semantically distinct (although that distinction is blurred in the case of
IDSs).

> 6) Some of the Kangxi radicals (especially those with stroke number >=
> 10) could be expressed with an IDS, using simpler components. Would it be
> considered conformant to make an IDS that "decomposes" a Kangxi radical?
>

No.

> 7) Will ISO/IEC ever publish a list of IDSs for existing CJK Unified
> Ideographs? (I.e. a sort of decomposition mapping file)?
>

No.

> As you may have guessed by my inappropriate terminology, I have absolutely
> no liason with any standard body or committee (oh, well, have been a private
> member of Unicode, but just for one year), and I discovered these documents
> only by chance. However, the possibility of implementing an IDS renderer
> sounds very appealing to me, because it reminds me of an idea - that I have
> been cherishing for a long time - of an 8-bit character set for CJK
> characters, that only encodes the smallest possible set of basic
> "components".
>

It has been tried before. There are a number of problems.

1) There is too much ambiguity. Any scheme sufficiently powerful to reduce
the set of some 80,000 to 100,000 required ideographs to a set of 256 root
forms plus combining controls would also fall afoul of the various alternate
shapes that a single ideograph can take, plus ambiguities in the process of
breaking ideographs into pieces.

2) There is *enormous* overhead in trying to render IDSs. There is
enormous overhead in even trying to parse them for the sake of cursor
movement and line breaking. We don't even want to talk about working on
semantic equivalents for searching/replacing or the ramifications of
collation.

Unicode minimizes this overhead by stating that none of this need be done.
If there had been any requirement that Unicode comformance would imply
parsing and dealing with IDSs, they never would have made it into the
standard.

=====
John H. Jenkins
jenkins@apple.com
tseng@blueneptune.com
http://www.blueneptune.com/~tseng



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT