I still have a problem with Unicode terminology.
I think I understand the concept of glyph. It is my understanding
that Unicode defines the set of characters as being in one-to-one
correspondence with codepoints; thus, we have non-combining characters
and combining characters. There also is an equivalence on strings of
characters (or, equivalently, finite sequences of codepoints), whence
the canonical representatives (``normalisation forms''). (I'm
glossing over the fact that there are actually several notions of
Now, it seems to me that underlying all of this there is a notion of
``non-necessarily encoded non-combining character'' (NNENCCS) that
corresponds to a sequence of zero, one or several combining characters
followed by a single non-combining character (taken up to equivalence,
of course). Think of the set of non-Unicode characters as the set of
all precomposed forms that might conceivably be encoded in Unicode
(although, of course, they won't, for very good reasons). Examples of
NNENCCS are things such as LATIN SMALL LETTER E WITH OGONEK AND ACUTE
or ARABIC LETTER ALIF WITH DOT ABOVE.
Does this notion make sense? Note here that I'm not assuming that the
NNENCCSes are in one-to-one correspondence with glyphs, and I think the
notion is pretty natural for, say, Arabic too, as it makes sense to
speak of the ARABIC LETTER HEH WITH ACUTE without specifying the form
of the HEH.
What's the official name of a NNENCCS?
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT