Definitions

From: Chris Jacobs (chris.jacobs@freeler.nl)
Date: Wed Nov 12 2003 - 19:33:21 EST

  • Next message: Kenneth Whistler: "Re: Ewellic"

    "The interpretation of private use characters (Co) as graphic characters or not is determined by private agreement."
    "The interpretation of private use characters (Co) as base characters or not is determined by private agreement."
    "The interpretation of Private Use characters (Co) as combining characters or not is determined by private agreement. "

    Is this just another way of saying that this is left undefined, or does it imply that a conformant application should be able to detect if private agreements exist?

    Are there any rules about the behavior of conformant processes who actually use these definitions?
      ----- Original Message -----
      From: Mark Davis
      To: Peter Kirk ; Unicode List
      Sent: Sunday, November 09, 2003 12:52 AM
      Subject: Re: ZWJ, ZWNJ, CGJ and combination

      The UTC just approved a clarification of the base character definition, as follows:

      D13a Graphic character: a character with the General Categories of Letter (L), Combining Mark (M), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs).

        a.. Graphic characters specifically exclude the line and paragraph separators (Zl, Zp) and exclude the characters with the General Categories of Other (Cn, Cs, Cc, Cf).
        b.. For more information, see Chapter 2, especially Section 2.4 Code Points and Characters and Table 2-2 Types of Code Points.
        c.. Not all graphic characters have visibly rendered glyphs. Particular examples include spaces and some combining marks.
        d.. The interpretation of private use characters (Co) as graphic characters or not is determined by private agreement. However, in the absence of private agreement, private use characters should be interpreted as graphic characters.
      D13b Base character: any graphic character except for those with the General Category of Combining Mark (M).

        a.. Most Unicode characters are base characters. A base character is any code point that has one of the General Categories of Letter (L), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs).
        b.. Base characters are independent graphic characters, but this does not preclude the presentation of base characters from adopting different contextual forms or participating in ligatures.
        c.. The interpretation of private use characters (Co) as base characters or not is determined by private agreement. However, in the absence of private agreement, private use characters should be interpreted as base characters.
      D14 Combining character: a graphic character with the General Category of Combining Mark (M).

        a.. The graphic positioning of a combining character depends on the last preceding base character. The combining character is said to apply to that base character.
        b.. Combining characters consist of all characters with the General Category values of Spacing Combining Mark (Mc), Non-Spacing Mark (Mn), and Enclosing Mark (Me).
        c.. All characters with non-zero canonical combining class (Cc) are combining characters, but the reverse is not the case: there are combining characters with a zero canonical combining class.
        d.. The interpretation of Private Use characters (Co) as combining characters or not is determined by private agreement.

      Mark
      __________________________________
      http://www.macchiato.com
      ► शिष्यादिच्छेत्पराजयम् ◄



    This archive was generated by hypermail 2.1.5 : Wed Nov 12 2003 - 20:30:04 EST