RE: CJK combining components (was RE: "Giga ...)

From: Marco.Cimarosti@icl.com
Date: Thu Oct 19 2000 - 07:35:38 EDT


James E. Agenbroad wrote:
> If I had to make a guess it would be that transforming the
> glyphs of parts of characters so they will fit together in
> a pleasing fashion would take about as much effort (or
> more) than designing separate glyphs for each new character.

Perhaps. I am a programmer, so I tend to think that writing programs is
easier than drawing big fonts. But my viewpoint could be misleading; people
who know both activities are probably better fit to tell.

> Doesn't one also need to somehow specify the relative position of the
> parts to eachother? Just specifyinga the components of a
> character won't suffice if the top half has three components
> and the bottom half has one component above another of one side
> and just one on the right. There are templates for this but
> I think it is not trivial.

Of course the position of components has to be known, in a way or another.

One way is to specify it *explicitly*, and IDS is a good example of how this
could work: the dozen of IDC operators (U+2FF0 to U+2FFB) can describe very
precisely how components fit within the character square.

But, for the great majority of ideographs, this information could also be
*implicit*, because same components (often corresponding to the "radical" in
the dictionary, and normally being the first one in the ideograph, writing
order) often sit in a fixed default position within the ideograph (left,
top, around, etc.), while the rest has to fit in the remaining free space
(right, bottom, inside, etc.).

Jon Babcock is satisfied to stop here, and indeed two "holograms" can
greatly reduce the number of characters needed.

But I notice that, if the leading component (the "radical") is always
encoded in the first position (regardless of the writing order), then the
process can be made recursive: the second component would take up the free
space and define a new, smaller, free space for the third component to
squeeze in, and so on for all subsequent components.

The last component in the sequence is special, because it takes up all the
remaining space. All components would thus have two contextual glyphs: a
<joining> form, that leaves an empty space for the next component, and a
<final> form that takes up the whole square.

The difference between the two contextual forms would normally be just in
the size and proportion of the glyph. But, in certain cases, there can also
be other differences; e.g., all components that have an horizontal stroke at
their bottom, change this stroke to a slanted stroke in the <joining> form.

A few operators la IDS are still needed for fixing special cases, i.e.
when a "radical" sits in an unusual position.

To complete the story, our hypothetical encoding could also have
font-dependent ligatures. Many scripts require special glyphs to fine-tune
the appearance of special sequences, like Latin <f+f+i>, or Arabic
<laam+alif>. Similarly, special glyphs could exist for groups of components
whose default rendering is not satisfactory.

Anyway. I think that everybody probably had quite enough of this daydreams
of mine et al.
So, if anyone wishes to go on chatting about this, shouldn't we do it
privately?

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT