Re: Mixed up priorities

From: Mark E. Davis (
Date: Sat Oct 23 1999 - 15:27:50 EDT

I haven't been able to follow the list much lately, so I am coming in at the
end. A few points:

1. The 10646 definition is so broad as to be meaningless. A track on my CD is
also a character by that criterion.

2. It is really best to avoid the term character altogether in an encoding
context, because it is so overloaded. See my discussion of this on, which has
an introduction that discusses characters, glyphs, graphemes, code points,
code units.

Phrased in those terms, 'ch' in Slovak is a grapheme that is represented with
2 code points; similarly, 'å' is a grapheme in Danish that is represented
with either 1 or 2 code points, while 'ksha' is a grapheme in Hindi that is
represented with 3 code points.


Michael Everson wrote:

> Ar 15:16 -0700 1999-10-22, scríobh Ashley Yakeley:
> >>Abstract character: A unit of information use for the organization,
> >>control, or representation of textual data. [This is the ISO 10646
> >>definition of "character".]
> No it isn't. From ISO/IEC 10646-1:1993:
> 4.6 character: A member of a set of elements used for the organisation,
> control, or representation of data.
> >For this reason, I like to say that in Slovak, 'ch' is a composite
> >character that's made up of two other characters.
> No, it is two characters treated as a single letter in a certain context.
> A letter is an element of an alphabet, which itself is a structured
> collection of graphic symbols used to represent one or more languages,
> having specific elements representing for vowels and consonants.
> --
> Michael Everson * Everson Gunn Teoranta *
> 15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
> Guthán: +353 1 478 2597 ** Facsa: +353 1 478 2597 (by arrangement)
> 27 Páirc an Fhéithlinn; Baile an Bhóthair; Co. Átha Cliath; Éire

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT