L2/03-041 Source: Mark Davis Date: Feb 10, 2003 Title: Base Character Definition D13 I got a question from someone here about the exact definition of "base character". I took a look at it, and the definition is very badly written. We have: D13 Base character: a character that does not graphically combine with preceding characters, and that is neither a control nor a format character. - Most Unicode characters are base characters. This sense of graphic combination does not preclude the presentation of base characters from adopting different contextual forms or participating in ligatures. D14 Combining character: a character that graphically combines with a preceding base character. The combining character is said to apply to that base character. ... In determining what D13 actually means in practice, one might start by analyzing it as follows: - it is a character (so remove Cn, Cs) - it is not a control or format (so remove Cc, Cf) - it is not a combining character (so remove Mc, Mn, Me). But this is not exactly crystal clear. And certainly Zl and Zp (line/paragraph separators) are not explicitly mentioned but should be. The two definitions D13 and D14 are also circular, and a non-spacing mark *can* apply to a spacing combining mark (e.g. some Indic combinations). The definition also does not mention private use characters. My suggestion for a fix is in light with our new Grand Character Typology in chapter 2. D13 Base character: a spacing graphic character, specifically excluding control and format characters. - In terms of General Category values, a base character is any code point that has one of the categories Letter, Number, Punctuation, Symbol, Space Separator, or Spacing Combining Mark. (In other words, it excludes the values Cn, Cs, Cc, Cf, Zl, Zp, Mn, and Me). The interpretation of Private Use characters (Co) is determined by the implementation. - Most Unicode characters are base characters. This sense of graphic combination does not preclude the presentation of base characters from adopting different contextual forms or participating in ligatures. D14 Combining character: a character that graphically combines with a preceding base character. The combining character is said to apply to that base character. Also known as combining mark. - Combining characters consist of all characters with the General Category values of Spacing Combining Mark (Mc), Non-Spacing Mark (Mn), and Enclosing Mark (Me). ...