Re: Abstract character?

From: Peter_Constable@sil.org
Date: Tue Jul 23 2002 - 10:30:14 EDT


On 07/22/2002 03:38:50 PM Kenneth Whistler wrote:

>Abstract character
>
> that which is encoded; an element of the repertoire (existing
> independent of the character encoding standard, and often
> identifiable in other character encoding standards, as well
> as the Unicode Standard); the implicit basis of transcodings.

[snip]

>> - do <U+00C5> () and <U+0041, U+030A> (A followed by combining ring
>> above) represent the same abstract character?
>
>Yes. That is the implicit claim behind a specification of canonical
>equivalence.

This brings to mind another question: what's the relationship between
character sequences and abstract characters? Does < 0041, 030A > represent
a single abstract character or a sequence of abstract characters? Ken's
answer above suggests a single abstract character. Actually, the question
that's really bothering me is the next one.

Moving one step further (perhaps you already guessed where I was going),
what of < 1000, 102D, 102F >? Whether we consider it a single abstract
character, or a sequence of abstract characters, the more important
question to me is whether it is the same abstract character (sequence) as
< 1000, 102F, 102D >. The only thing that makes sense is that they are the
same abstract character sequences. But, they are not canonically
equivalent! Is the contrapositive to your statement true? I.e. is it true
that lack of canonical equivalence implies a distinction in abstract
character (sequences)?

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Tue Jul 23 2002 - 08:59:25 EDT