L2/03-041R2

Re: Base Character Definition D13
From: Mark Davis
Date: 2003-02-10 (Updated 2003-08-26)

[This document is updated as per the UTC discussion on 8.26; look at the part below the horizontal line.]

I got a question from someone here about the exact definition of "base character". I took a look at it, and the definition is very badly written. We have:

D13 Base character: a character that does not graphically combine with preceding characters, and that is neither a control nor a format character.

D14 Combining character: a character that graphically combines with a preceding base character. The combining character is said to apply to that base character.

In determining what D13 actually means in practice, one might start by analyzing it as follows:
- it is a character (so remove Cn, Cs)
- it is not a control or format (so remove Cc, Cf)
- it is not a combining character (so remove Mc, Mn, Me).

But this is not exactly crystal clear. And certainly Zl and Zp (line/paragraph separators) are not explicitly mentioned but must be. The two definitions D13 and D14 are also circular. The definition and notes also do not mention private use characters. We propose the following fix to the text, in light with our new Grand Character Typology in Chapter 2, for the next appropriate version of the Unicode Standard:


D13a Graphic character: a character with the General Categories of Letter (L), Combining Mark (M), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs).

D13b Base character: any graphic character except for those with the General Category of Combining Mark (M).

D14 Combining character: a character with the General Category of Combining Mark (M).