Re: CCS and CEF definitions in UTR #17

From: Keld Jørn Simonsen (keld@dkuug.dk)
Date: Mon Apr 24 2000 - 21:39:01 EDT


On Fri, Apr 21, 2000 at 05:08:43PM -0800, Mike Brown wrote:
>
> Here is what I want someone to tell me:
>
> 1. The set of integers in a coded character set can include integers that
> are not assigned to abstract characters.

I think that this cannot be true, by definition. Understood in the sense that for the
private use area, the mapping is to unspecified (abstract) characters.
The "surrogates are not part of the coded character set, but part of the
character encoding, they are extra values used for the encoding of the
coded character set proper.

(I put abstract in parenthesis as this is not an ISO 10646 term, but
a Unicode term - amongst others. However I like it and use it myself as
a clarification of the ISO "character" term.)

> 2. Code unit sequences defined by a character encoding form can map to
> integers that are part of a coded character set but that have not been
> assigned to abstract characters.

If you think of the "surrogates" then rather think of them as a 32-bit value
that is the encoding of a integer representing an abstract character in planes
beyond BMP. Like a UTF-8 string of more than one octet representing
characters beyond U007F.

Kind regards
Keld



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT