Re: What is the principle?

From: Peter Kirk (
Date: Mon Mar 29 2004 - 17:39:37 EST

  • Next message: Kenneth Whistler: "Re: What is the principle?"

    On 29/03/2004 11:28, Kenneth Whistler wrote:

    > ...
    >Third, the proposal to "transfer ... some or all of the Variation
    >Selectors on the SSP to Private Use" is unclear on the concept of
    >Private Use. The UTC will make *no* semantic encoding commitment
    >regarding what a private use character is to be used for. That would
    >include *not* specifying that some range of Private Use characters
    >be dedicated to use as variation selectors (privately defined). ...

    The problem here is that, despite what you say, the UTC has already
    specified the character properties of all of the existing PUA
    characters, in a way which rules out their use as variation selectors,
    or as combining marks, or as right-to-left characters.

    As an alternative to adjusting the definitions of the existing variation
    selectors, might it be possible for the UTC to adjust the character
    properties of parts of the Supplementary Private Use Areas? For example,
    a range of characters could be defined as default ignorable, default
    collation weight [.0000.0000.0000.0000] etc., and so these could be used
    as private variation selectors, or as private diacritical marks (which
    would simply disappear if viewed with a regular font; they would be in
    combining class 0 and so there would be no normalisation issues); and
    another range could be defined as RTL; and whatever other ranges might
    be required. Alternatively, an additional PUA could be defined to avoid
    changing the properties of existing characters. This cannot be in
    conflict with the principle that "The UTC will make *no* semantic
    encoding commitment regarding what a private use character is to be used
    for" because these kinds of properties have already been specified by
    the UTC for the existing PUA.

    > ...
    >Peter Kirk said:
    >>Surely Variation Selectors are "default ignorable" characters, which
    >>implies that if a process (including collation?) doesn't know what to do
    >>with them they should be ignored, i.e. treated as not present rather
    >>than as undefined characters.
    >>From DerivedCoreProperties.txt in the Unicode Character Database:
    >FE00..FE0F ; Default_Ignorable_Code_Point # Mn [16] VARIATION
    >E0100..E01EF ; Default_Ignorable_Code_Point # Mn [240] VARIATION
    >Please read the standard carefully regarding what "default ignorable"
    >means. TUS 4.0, p. 142:
    >"Default ignorable code points are those that should be ignored by
    >default in rendering unless explicitly supported. ..."
    > ^^^^^^^^^
    >Some, like U+00AD SOFT HYPHEN, don't necessarily get the zeroes
    >treatment in the default collation table. Some, like U+034F COMBINING
    >GRAPHEME JOINER, while getting zero weights in the default table,
    >were added explicitly in order to make a potential distinction for
    Thanks for the clarification.

    >The *essential* concept of default ignorable characters is that
    >they consist of the class of characters which, if you don't know
    >what their impact on visual rendering is, you are better off
    >displaying *nothing* for them, rather than displaying the black
    >box (or other blort) indicating the presence of a nondisplayable
    This, as I see it, is also the *essential* concept of the private
    variation selectors which Ernest and I are suggesting. It seems that
    some further properties need to be defined. These would probably be
    similar to the default properties of the existing variation selectors -
    but not to those of CGJ, because something in the properties of CGJ has
    led at least some implementers to assume that it is ALWAYS ignored in
    rendering (and so is not passed to the rendering engine).

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Mon Mar 29 2004 - 18:28:07 EST