Re: Abstract character?

From: Lars Marius Garshol (larsga@garshol.priv.no)
Date: Tue Jul 23 2002 - 05:53:25 EDT


* Kenneth Whistler
|
| Abstract character
|
| that which is encoded; an element of the repertoire (existing
| independent of the character encoding standard, and often
| identifiable in other character encoding standards, as well
| as the Unicode Standard); the implicit basis of transcodings.
|
| Note that while in some sense abstract characters exist a
| priori by virtue of the nature of the units of various writing
| systems, their exact nature is only pinned down at the point
| that an actual encoding is done. They are not always obvious,
| and many new abstract characters may arise as the result of
| particular textual processing needs that can be addressed by
| characters. (E.g. WORD JOINER, OBJECT REPLACEMENT CHARACTER,
| etc., etc.)

This helps a little, but not all that much. I think spelling out the
details of how the term relates to the other terms would help.

The rest of the definitions wre quite clear.
 
* Lars Marius Garshol
|
| - are all assigned Unicode characters also abstract characters?
 
* Kenneth Whistler
|
| Yes. Or rather: all encoded characters are assigned to abstract
| characters.

Hmmmm. OK. So combining diacritics are also abstract characters? (I
was also unclear on ZWNJ and similar things, but you explicitly
mention that above, so...)
 
| (Note above -- abstract characters are also a concept which applies
| to other character encodings besides the Unicode Standard, and not
| all encoded characters in other character encodings automatically
| make it into the Unicode Standard, for various architectural
| reasons.)

Right. So VIQR, for example, also has abstract characters, then?
 
* Lars Marius Garshol
|
| - do <U+00C5> (Å) and <U+0041, U+030A> (A followed by combining ring
| above) represent the same abstract character?
 
* Kenneth Whistler
|
| Yes. That is the implicit claim behind a specification of canonical
| equivalence.

Right. Then I think I've more-or-less got it.

This helped a lot. Thank you!

However, it does raise a new problem. Isn't the definition of 'string'
in the XPath specification then wrong?

  Strings consist of a sequence of zero or more characters, where a
  character is defined as in the XML Recommendation [XML]. A single
  character in XPath thus corresponds to a single Unicode abstract
  character with a single corresponding Unicode scalar value (see
  [Unicode]); [...]
                          <URL: http://www.w3.org/TR/xpath#strings >

As far as I can tell, one of these two claims must be wrong. That is,
either a single XPath character does not necessarily correspond to a
single Unicode abstract character, or else a single XPath character
need not correspond to a single scalar value.

Does that sound reasonable?

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >



This archive was generated by hypermail 2.1.2 : Tue Jul 23 2002 - 04:08:41 EDT