Date: Tue Jul 23 2002 - 10:30:14 EDT

On 07/22/2002 03:38:50 PM Kenneth Whistler wrote:

>Abstract character
> that which is encoded; an element of the repertoire (existing
> independent of the character encoding standard, and often
> identifiable in other character encoding standards, as well
> as the Unicode Standard); the implicit basis of transcodings.


>> - do <U+00C5> () and <U+0041, U+030A> (A followed by combining ring
>> above) represent the same abstract character?
>Yes. That is the implicit claim behind a specification of canonical

This brings to mind another question: what's the relationship between
character sequences and abstract characters? Does < 0041, 030A > represent
a single abstract character or a sequence of abstract characters? Ken's
answer above suggests a single abstract character. Actually, the question
that's really bothering me is the next one.

Moving one step further (perhaps you already guessed where I was going),
what of < 1000, 102D, 102F >? Whether we consider it a single abstract
character, or a sequence of abstract characters, the more important
question to me is whether it is the same abstract character (sequence) as
< 1000, 102F, 102D >. The only thing that makes sense is that they are the
same abstract character sequences. But, they are not canonically
equivalent! Is the contrapositive to your statement true? I.e. is it true
that lack of canonical equivalence implies a distinction in abstract
character (sequences)?

