RE: CJK combining components (was "Giga Character Set": Nothing b ut noise)

Date: Mon Oct 16 2000 - 12:43:02 EDT

Carl W. Brown:
> An article in the October 12, 2000 issue of Linux Weekly News
> <> tries to explain the benefit: "Many
> Asian characters are composites, made up of one or more simpler
> characters. Unicode simply makes a big catalog of characters, without
> recognizing their internal structure; GCS apparently handles things in
> a more natural manner." However, the article does not go on
> to specify just what is better, more efficient, or more "natural"
> about the GCS approach.

Unicode does in fact make a big catalog of CJK ideographs, without
recognizing their internal structure. Deal with it.

But Unicode is in good company, as this is the standard approach used by all
CJK encoding standards, and was used even before in lead-type typography.
This approach has several advantages, the most important of which is
compatibility with existing software.
But there are also a few drawbacks, of course. E.g.: designing and
validating a CJK font becomes a behemoth enterprise; adding new characters
implies changing the encoding; the repertoire is doomed to be incomplete;
huge fonts are needed (and, indeed, looking up the proper glyph can be quite
heavy); shape-based input methods require huge tables; etc.

However, going back to GCS, a single potentially interesting idea is not
quite enough to turn sand into gold.

_ Marco

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT