Re: An Aburdly Brief Introduction to Unicode (was Re: Perception ...)

From: Mark Davis (markdavis34@home.com)
Date: Fri Feb 23 2001 - 11:57:20 EST


In somewhat more detail:

In general, a single abstract character corresponds to a single code point.
However, due to the requirement of compatibility with legacy code sets, plus
some inherent fuzziness in what constitutes abstract characters, there are
cases where this is not true:

- one abstract character can correspond to two different code points
- one abstract character can correspond to a sequence of two code points
- one code point can correspond to two different abstract characters
- one code point can correspond to a sequence of two abstract characters

Mark

----- Original Message -----
From: "John Cowan" <jcowan@reutershealth.com>
To: "Mark Davis" <markdavis34@home.com>
Cc: "Unicode List" <unicode@unicode.org>
Sent: Friday, February 23, 2001 08:21
Subject: Re: An Aburdly Brief Introduction to Unicode (was Re: Perception
...)

> Mark Davis wrote:
>
>
> >> A _code_point_ is an integer value which is assigned to an abstract
> >> character. Each character receives a unique code point.
> >
> >
> > inaccurate. Multiple *abstract characters* can have a single code point;
> > multiple code points can correspond to a single *abstract character*.
>
> TUS 3.0 is vague on this, but I suppose what is meant is that if two
> single characters are canonically equivalent, they constitute only one
> abstract character. Does U+0041 U+0300 represent one abstract
> character (the same as the abstract character represented by U+00C0)
> or two consecutive abstract characters? If the former, does U+0051
> U+0300 also represent an abstract character?
>
> --
> There is / one art || John Cowan <jcowan@reutershealth.com>
> no more / no less || http://www.reutershealth.com
> to do / all things || http://www.ccil.org/~cowan
> with art- / lessness \\ -- Piet Hein
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT