Re: An Aburdly Brief Introduction to Unicode (was Re: Perception ...)

From: Peter_Constable@sil.org
Date: Fri Feb 23 2001 - 15:06:04 EST


On 02/23/2001 10:34:05 AM "Mark Davis" wrote:

>In somewhat more detail:
>
>In general, a single abstract character corresponds to a single code
point.
>However, due to the requirement of compatibility with legacy code sets,
plus
>some inherent fuzziness in what constitutes abstract characters, there are
>cases where this is not true:
>
>- one abstract character can correspond to two different code points
>- one abstract character can correspond to a sequence of two code points
>- one code point can correspond to two different abstract characters
>- one code point can correspond to a sequence of two abstract characters

Surely this is messing with the definition of "abstract character" in an
unhelpful way. Either that, or with the meaning of "can correspond to".
Abstract characters in Unicode are defined by the sum of their character
properties, including their names, and their is a one-to-one mapping
between abstract characters and names, and also between names and
codepoints. Clearly, then, there is a one-to-one relationship between
abstract characters and codepoints. If one abstract character "can
correspond to" multiple distinct codepoints or a sequence of multiple
codepoints, then "can correspond to" involves either a canonical or
compatibility decomposition. That doesn't help make anything clearer, in my
opinion.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT