RE: CJK combining components

From: James E. Agenbroad (jage@loc.gov)
Date: Wed Oct 18 2000 - 13:52:23 EDT


On Wed, 18 Oct 2000 Marco.Cimarosti@icl.com wrote:

> Doug Ewell wrote:
> > Marco Cimarosti <Marco.Cimarosti@icl.com> wrote:
> > > Carl W. Brown:
> > >> An article in the October 12, 2000 issue of Linux Weekly News
> > >> <http://lwn.net/bigpage.php3> tries to explain the benefit...
> >
> > Actually, that quote from Linux Weekly News came from me, not Carl.
> > (I'm not trying to take credit for the research, just deflecting any
> > criticism away from Carl.)
>
> My mistake, sorry. And thanks to Doug for providing this info.
>
> However, I was not criticizing that article -- nor defending GCS! --, but
> rather annoying the list (once more!) about the pros and cons of CJK
> characters seen as atomic units, as opposed to composed graphemes.
>
> This topic is so boring probably because it is a chicken-egg problem: a CJK
> ideograph is in fact a "character", just like any alphabetic letter is, but
> it is also a "compound" that can be analyzed in smaller elements, pretty
> like the jamos in a Hangul syllable, or the letters (and diacritics) in a
> word.
>
> David Starner wrote:
> > If you can decompose the CJK characters into pieces and automatically
> > recompose them, what stops you from doing that for Unicode?
>
> Yeah! Nothing can stop me! (Well, apart maybe time and budget
> considerations, and the fact that I am not in the fonts business -- but
> that's nobody's problem :-)
>
> > The only problem is that you have to decompose the Unicode CJK
> > characters yourself, and you still have the table look ups,
> > but there's no need to carry around a huge font.
>
> OK. But, in a hypothetical encoding by components, this look up wouldn't be
> necessary at all.
>
> And in a hypothetical "mixed" encoding (i.e., having both precomposed
> ideographs and combining elements), it would only be needed for
> normalization (i.e. when you want the text to be either all precomposed or
> all decomposed).
>
> > Even if you have to work with preexisting Unicode technology,
> > you could still make the font using that technology instead of doing
> > everything by hand.
>
> Yes, I see your point: provided that ideographic decomposition really has
> some utility, this utility is not necessarily in the encoding.
>
> This is true, and a good point, but not necessarily a definitive argument
> against the theoretical possibility of a decomposed encoding.
>
> Compatibility with the existing practice is the only argument that convinces
> me (sort of) that Unicode provides the best possible encoding for CJK
> logographs.
>
> _ Marco
>
                                           Wednesday, Ocotber 18, 2000
Doesn't one also need to somehow specify the relative position of the
parts to eachother? Just specifyinga the components of a character won't
suffice if the top half has three components and the bottom half has one
component above another of one side and just one on the right. There are
templates for this but I think it is not trivial.
     Regards,
          Jim Agenbroad ( jage@LOC.gov )
     The above are purely personal opinions, not necessarily the official
views of any government or any agency of any.
Phone: 202 707-9612; Fax: 202 707-0955; US mail: I.T.S. Dev.Gp.4, Library
of Congress, 101 Independence Ave. SE, Washington, D.C. 20540-9334 U.S.A.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT