From: John H. Jenkins (jenkins@apple.com)
Date: Sun Oct 28 2007 - 19:52:58 CST
There are actually two different mechanisms incorporated into Unicode  
to allow some form of representation of unencoded ideographs.  The  
first is the Ideographic Variation Indicator (U+303E), and the other  
is the Ideographic Description Sequence mechanism.  Both of these are  
relatively crude graphically, although using IDSs you could probably  
come up with a reasonable visual representation of the shape intended  
most of the time.  They are, however, ideal for embedding in text.
There is also the CDL mechanism being worked on by Wenlin.  This is  
XML-based and so is not really appropriate for embedding in plane  
text, but it is also capable of showing considerably greater  
flexibility in providing a precise visual representation of the  
intended shape.
On the whole, however, the user community currently favors strongly  
the one ideograph-one Unicode character approach.
The fundamental problem with a component-based approach to *encoding*  
(as opposed to representation) is the ambiguity involved.  It is  
frequently possible to break down a character in more than one way.  A  
simple example of this is the common character U+7AE0 (章), which  
could be represented using IDSs either as ⿱音十, ⿱立早, or ⿳ 
立日十 (plus other possibilities caused by compatibility ideographs  
and encoded radicals).  Trying to define a normalization for IDSs and  
allow for multiple spellings in searching or sorting would be a  
monumental task; this is one of the main reasons why component-based  
systems have never really gained momentum as a way to formally encoded  
unencoded characters.
=====
John H. Jenkins
jenkins@apple.com
This archive was generated by hypermail 2.1.5 : Sun Oct 28 2007 - 19:55:15 CST