Re: Unicode and transliteration

From: Jon Babcock (jon@kanji.com)
Date: Thu Aug 26 1999 - 14:38:46 EDT


> In most cases, Unicode characters
> are defined in terms of abstract orthographic units, sometimes
> called graphemes, but it is considered the exception to define
> a Unicode character in terms of glyphs, which is what most
> people would probably think of as "surface".

One exception is the Han repertoire of >20,000 graphs. The graphemic
approach was not used for this group. Nor are they defined wholly in
terms of glyphs; often several glyph variants are unified into one
Unicode 'character', thank heavens. But many glyph variants are not,
often for various reasons. Chinese script was not broken down into its
graphemes for inclusion in Unicode (not an easy task, but possible)
which, had it been done, would have closely corresponded to the approach
used for English and most, if not all, the other languages. But of
course Unicode did not attempt to record every variant Chinese glyph
either. So what is encoded in the standard for the Han repertoire is
not really a character or a glyph. It would have been roughly as if
there were a separate code point for each English root. Please correct
me if I'm wrong.

Jon

--
Jon Babcock <jon@kanji.com>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT