Re: Repertoire, encoding, and representation (Was: Charsets + encoding + codesets)

From: John Cowan (
Date: Thu Oct 09 1997 - 23:34:08 EDT

Keld Simonsen writes:

> OK. I was mislead by beliving that a Unicode "abstract character" could
> be consisting og more than one code point. This is not so, I learnt.
> Many of my arguments have been based on that misunderstanding.

Well, it can have two codepoints, but only if they form a surrogate-pair.

> OK. I was here mislead by the term "surrogate character" indicating
> that this was a character - and I also think they were explained to be
> characters by some Unicode people on the Unicode list some time ago, but
> I do not think there is a need to find that citation. I think the
> "character" in "surrogate character" is misleading, you probably should
> use some other word here, maybe "surrogate code point", or (why not?)
> "RC-elements".

Remember that "character" is not a formally defined term in Unicode;
"abstract character" is defined, and so is "base character" and
"combining character" and lots of others, but "character" by itself
is a purely informal term.

