Re: Charsets + encoding + codesets

From: John Cowan (cowan@drv.cbc.com)
Date: Tue Oct 07 1997 - 10:10:41 EDT


Keld J|rn Simonsen wrote:

> The trouble is that the "repertoire" of Unicode and 10646 is different.
> 10646 is clear on what is the repertoire: it is the characters of all
> its code points. Unicode is clear on "abstract characters" that
> you can make abstract characters by combining a number of characters
> such as a base letter and then one or more combining accents.
> But the combinations are not defined or limited, so for Unicode
> you have an unlimited repertoire of Unicode abstract characters.

Actually, no. The definition of "abstract character" in Unicode 2.0
(page 3-4) is the same as the definition of "character" in ISO 10646.
The term "character", on the other hand, is not defined at all
in the normative parts of the Unicode Standard.

The glossary (page G-2) defines "character" in five different ways,
one of which is as a synonym for "abstract character". The
other definitions are: "the smallest component of written language
that has semantic value"; "a 16-bit unit of textual information";
synonym for "code value"; synonym for "Han ideograph".

There is, as far as I can tell, no single term used in the Unicode
Standard for what you are calling an "abstract character" above.
I would like there to be one, myself.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
			e'osai ko sarji la lojban



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT