Definitions (Was: Re: Charsets + encoding + codesets)

From: John Clews (
Date: Thu Oct 09 1997 - 07:25:15 EDT

Definitions (Was: Re: Charsets + encoding + codesets)

In message <> writes:
> Keld J|rn Simonsen wrote:
> > John Cowan writes:
> > > There is, as far as I can tell, no single term used in the Unicode
> > > Standard for what you are calling an "abstract character" above.
> > > I would like there to be one, myself.
> >
> > 10646 has the term "composite sequence".

John Cowan ( wrote:

> No, that won't work. We need a term for the underlying abstraction
> that can be represented either by a single (concrete) character
> or by a composite sequence. Ken Whistler has used the
> term "grapheme" (starting today). This term, AFAIK, is
> always collocated with "phoneme" and is used in discussions of
> text-to-speech conversion, speech-to-text conversion, and
> learning to read (which is a kind of text-to-speech conversion).
> Still, terminological buccaneering may be useful.

John Clews writes:

Why not do, as ISO/IEC 10646 (and all ISO/IEC/JTC1/SC2 standards) do,
or at worst imply, and use the term "character" for those elements
which can be related to a single code point, and "graphic character"
for that class of characters which can be related to more than one
code point, using any permitted combining options?

See the definitions section of ISO/IEC 10646 and/or other standards
from ISO/IEC/JTC1/SC2 for further details, and/or my own email of a
couple of days ago to the Unicode list.

Best wishes

John Clews

John Clews (Chair of ISO/TC46/SC2: Conversion of Written Languages)

SESAME Computer Projects, 8 Avenue Road, Harrogate, HG2 7PG, England Email:; tel: +44 (0) 1423 888 432

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT