Re: Repertoire, encoding, and representation (Was: Charsets + encoding + codesets)

From: Keld J|rn Simonsen (keld@dkuug.dk)
Date: Thu Oct 09 1997 - 13:22:57 EDT


Kenneth Whistler writes:

> > > I'll state this one more time, because Keld keeps claiming it isn't
> > > so:
> > >
> > > The repertoire of the Unicode Standard and of ISO/IEC 10646 are
> > > *exactly* the same.
> >
> > That is possible, but then the definitions of "repertoire"
> > are different for the two specifications. "I have 3 apples and
> > you have 3 oranges. We have the same." :-)
>
> Not true.

OK. I was mislead by beliving that a Unicode "abstract character" could
be consisting og more than one code point. This is not so, I learnt.
Many of my arguments have been based on that misunderstanding.

> > And what about the
> > "surrogates"? These are genuine characters in Unicode
> > but not so in 10646.
>
> Keld, this is another egregious piece of disinformation. The Unicode
> "surrogate characters" are exactly the same as the "RC-elements"
> specified in definition 4.30 of Amendment 1 to 10646 (UTF-16).
> Surrogate characters have no independent interpretation as characters--
> they are only interpretable as a pair of high-surrogate + low-surrogate
> codes.

OK. I was here mislead by the term "surrogate character" indicating
that this was a character - and I also think they were explained to be
characters by some Unicode people on the Unicode list some time ago, but
I do not think there is a need to find that citation. I think the
"character" in "surrogate character" is misleading, you probably should
use some other word here, maybe "surrogate code point", or (why not?)
"RC-elements".

Keld



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT