Ideographic Description

From: Marco.Cimarosti@icl.com
Date: Wed Sep 08 1999 - 10:54:27 EDT


Hallo!

I am new to the list, so please excuse me if I am asking something that has
already been dealt with in the past. Or if I am going out of theme.

I am looking for information about some additions to ISO-10646 (and Unicode
too?) that are being discussed/balloted at ISO/IEC. The additions go under
these titles:
- New block for "Kangxi Radicals"
- New block for "CJK Radicals Supplement"
- New block for "Ideographic Description Characters" (IDC)
- "Ideographic Description Sequences" (IDS)

All I know about these topics may be found in
http://anubis.dkuug.dk/jtc1/sc2/wg2/docs/documents.
I found that these documents are particularly informative:
[1] http://anubis.dkuug.dk/jtc1/sc2/wg2/docs/n1892.pdf
        ("PDAM 28 - Ideographic description characters", by Paterson/Zhang,
1998-10-19)
[2] http://anubis.dkuug.dk/jtc1/sc2/wg2/docs/n1923.pdf
        ("Text for combined PDAM registration and consideration ballot -
PDAM 15 - Kang Xi Radicals and CJK Radical Supplements - SC2 N3213", by
Paterson, 1998-10-28)

I understand that IDSs are a way of providing a description for an
"ideograph" that is not (yet) encoded within the CJK Unified Ideographs.
In other terms they permit to do a descriptive "spelling" for an ideograph,
just like you would do with an alphabetic word (e.g. "Marco" -> "capital em,
a, er, cee, o").

Document [1] describes an IDS as follows: "an IDC followed by a fixed number
of Description Components (DC). A DC may be one of the following:"
        "- a coded ideograph" (that is, a regular CJK Unified
Ideograph)
        "- a coded radical" (one of the new characters in 'Kangxi
Radicals'?)
        "- a coded ideographic component" (one of the new characters in
'CJK Radicals Supplement'?)
        "- another IDS"" (which make it recursive, allowing for
arbitrary length IDSs)

So, an IDS is an expression made up of prefix operators (the IDCs) and
operands, that are constituted by regular ideographs or special ideoograph
components.

Sorry for this verbose introduction, but I needed to explain what I was
talking about.
Here come the questions:

1) What is IDS really for? Why has this feature been introduced in
ISO-10646?

2) Will these addition be integrated in Unicode as well?

3) Document [1] explicitly states that an IDS "describes the ideograph
in the abstract form. It is not interpreted as a composed character and does
not have any rendering implication." -- OK: pretty rendering of IDSs is not
*required* to conformant applications, but is it *forbidded*?

4) Would it be conformant to use an IDS in place of a character already
encoded within CJK Unified Ideographs?

5) What if one only uses Description Components (DC) form the new
"Kangxi Radicals" and "CJK Radicals Supplement": would it be possible to
build valid IDSs for *all* the encoded CJK Unified Ideographs using only
these elements?

6) Some of the Kangxi radicals (especially those with stroke number >=
10) could be expressed with an IDS, using simpler components. Would it be
considered conformant to make an IDS that "decomposes" a Kangxi radical?

7) Will ISO/IEC ever publish a list of IDSs for existing CJK Unified
Ideographs? (I.e. a sort of decomposition mapping file)?

As you may have guessed by my inappropriate terminology, I have absolutely
no liason with any standard body or committee (oh, well, have been a private
member of Unicode, but just for one year), and I discovered these documents
only by chance. However, the possibility of implementing an IDS renderer
sounds very appealing to me, because it reminds me of an idea - that I have
been cherishing for a long time - of an 8-bit character set for CJK
characters, that only encodes the smallest possible set of basic
"components".

Regards.
        Marco Cimarosti, Italy



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT