RE: Terminology question: character-like thing

From: Marco.Cimarosti@icl.com
Date: Wed Sep 29 1999 - 09:15:36 EDT


I don't have any answers, sorry. I am myself more a question asker than an
answer giver.

I just wanted to make you notice that you talk about combining characters
*preceding* base characters. In Unicode it is rather the other way round, so
you should say: "a single non-combining character *followed by* zero, one or
several combining characters".

I understand that implementers of TTY-like terminal software would prefer it
the other way round, to avoid look-aheads or backtracking: see a recent
discussion about this. Currently, diacritic signs preceding base characters
are used on some national keyboard drivers. The preceding diacritics are
called "dead keys" in this context.

Regards.
        Marco

> -----Original Message-----
> From: Juliusz Chroboczek [SMTP:jec@dcs.ed.ac.uk]
> Sent: 1999 September 29, Wednesday 14.36
> To: Unicode List
> Subject: Terminology question: character-like thing
>
> I still have a problem with Unicode terminology.
>
> I think I understand the concept of glyph. It is my understanding
> that Unicode defines the set of characters as being in one-to-one
> correspondence with codepoints; thus, we have non-combining characters
> and combining characters. There also is an equivalence on strings of
> characters (or, equivalently, finite sequences of codepoints), whence
> the canonical representatives (``normalisation forms''). (I'm
> glossing over the fact that there are actually several notions of
> equivalence.)
>
> Now, it seems to me that underlying all of this there is a notion of
> ``non-necessarily encoded non-combining character'' (NNENCCS) that
> corresponds to a sequence of zero, one or several combining characters
> followed by a single non-combining character (taken up to equivalence,
> of course). Think of the set of non-Unicode characters as the set of
> all precomposed forms that might conceivably be encoded in Unicode
> (although, of course, they won't, for very good reasons). Examples of
> NNENCCS are things such as LATIN SMALL LETTER E WITH OGONEK AND ACUTE
> or ARABIC LETTER ALIF WITH DOT ABOVE.
>
> Does this notion make sense? Note here that I'm not assuming that the
> NNENCCSes are in one-to-one correspondence with glyphs, and I think the
> notion is pretty natural for, say, Arabic too, as it makes sense to
> speak of the ARABIC LETTER HEH WITH ACUTE without specifying the form
> of the HEH.
>
> What's the official name of a NNENCCS?
>
> Thanks,
>
> J.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT