I greatly appreciate the time you've taken to explain all this.
> So think of UTF-16 and UTF-8 as providing sequences of code
> units that map to *all* of the integers in the code space.
I'm happy to accept your explanations that the code space is a set of
integers available for a coded character set and a character encoding form
is a mechanism for converting those integers to code unit sequences.
> If you want to be strict about it
I don't *want* to be pedantic :) It's just when I read things like "Code
positions from 0000 D800 to 0000 DFFF are reserved for the UTF-16 form and
do not occur in UCS-4" in the UTF-16 amendment to ISO/IEC 10646-1, I want to
find out why these Unicode scalar values "do not occur in" the UCS-4
encoding form. Is it because the encoding form, which I don't have access
to, explicitly says it does not apply to that part of the code space, or is
it because the code space is not contiguous and omits those ranges?
Looking to the Unicode Standard 3.0 and UTR #17 for answers, I ended up
finding that a conservative interpretation yielded more ambiguities than
Mike J. Brown, software engineer, Webb Interactive Services
XML/XSL stuff: http://www.skew.org/ http://www.webb.net/
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT