Re: Charsets + encoding + codesets

From: Keld J|rn Simonsen (keld@dkuug.dk)
Date: Tue Oct 07 1997 - 17:39:21 EDT


John Cowan writes:

> Keld J|rn Simonsen wrote:
>
> > The trouble is that the "repertoire" of Unicode and 10646 is different.
> > 10646 is clear on what is the repertoire: it is the characters of all
> > its code points. Unicode is clear on "abstract characters" that
> > you can make abstract characters by combining a number of characters
> > such as a base letter and then one or more combining accents.
> > But the combinations are not defined or limited, so for Unicode
> > you have an unlimited repertoire of Unicode abstract characters.
>
> Actually, no. The definition of "abstract character" in Unicode 2.0
> (page 3-4) is the same as the definition of "character" in ISO 10646.
> The term "character", on the other hand, is not defined at all
> in the normative parts of the Unicode Standard.

Yes, that is also what I have been assuming in my postings,
Unicode "abstract character" is (roughly) equivalent to 10646
"character".

But then Ken says that Unicode "abstract characters" can be made from
base-letter + combining-sequence, and I normally trust Ken on
representing Unicode correctly.

> The glossary (page G-2) defines "character" in five different ways,
> one of which is as a synonym for "abstract character". The
> other definitions are: "the smallest component of written language
> that has semantic value"; "a 16-bit unit of textual information";
> synonym for "code value"; synonym for "Han ideograph".
>
> There is, as far as I can tell, no single term used in the Unicode
> Standard for what you are calling an "abstract character" above.
> I would like there to be one, myself.

10646 has the term "composite sequence".

Keld



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT