From: Peter Kirk (firstname.lastname@example.org)
Date: Mon Mar 29 2004 - 17:39:37 EST
On 29/03/2004 11:28, Kenneth Whistler wrote:
>Third, the proposal to "transfer ... some or all of the Variation
>Selectors on the SSP to Private Use" is unclear on the concept of
>Private Use. The UTC will make *no* semantic encoding commitment
>regarding what a private use character is to be used for. That would
>include *not* specifying that some range of Private Use characters
>be dedicated to use as variation selectors (privately defined). ...
The problem here is that, despite what you say, the UTC has already
specified the character properties of all of the existing PUA
characters, in a way which rules out their use as variation selectors,
or as combining marks, or as right-to-left characters.
As an alternative to adjusting the definitions of the existing variation
selectors, might it be possible for the UTC to adjust the character
properties of parts of the Supplementary Private Use Areas? For example,
a range of characters could be defined as default ignorable, default
collation weight [.0000.0000.0000.0000] etc., and so these could be used
as private variation selectors, or as private diacritical marks (which
would simply disappear if viewed with a regular font; they would be in
combining class 0 and so there would be no normalisation issues); and
another range could be defined as RTL; and whatever other ranges might
be required. Alternatively, an additional PUA could be defined to avoid
changing the properties of existing characters. This cannot be in
conflict with the principle that "The UTC will make *no* semantic
encoding commitment regarding what a private use character is to be used
for" because these kinds of properties have already been specified by
the UTC for the existing PUA.
>Peter Kirk said:
>>Surely Variation Selectors are "default ignorable" characters, which
>>implies that if a process (including collation?) doesn't know what to do
>>with them they should be ignored, i.e. treated as not present rather
>>than as undefined characters.
>>From DerivedCoreProperties.txt in the Unicode Character Database:
>FE00..FE0F ; Default_Ignorable_Code_Point # Mn  VARIATION
>E0100..E01EF ; Default_Ignorable_Code_Point # Mn  VARIATION
>Please read the standard carefully regarding what "default ignorable"
>means. TUS 4.0, p. 142:
>"Default ignorable code points are those that should be ignored by
>default in rendering unless explicitly supported. ..."
>Some, like U+00AD SOFT HYPHEN, don't necessarily get the zeroes
>treatment in the default collation table. Some, like U+034F COMBINING
>GRAPHEME JOINER, while getting zero weights in the default table,
>were added explicitly in order to make a potential distinction for
Thanks for the clarification.
>The *essential* concept of default ignorable characters is that
>they consist of the class of characters which, if you don't know
>what their impact on visual rendering is, you are better off
>displaying *nothing* for them, rather than displaying the black
>box (or other blort) indicating the presence of a nondisplayable
This, as I see it, is also the *essential* concept of the private
variation selectors which Ernest and I are suggesting. It seems that
some further properties need to be defined. These would probably be
similar to the default properties of the existing variation selectors -
but not to those of CGJ, because something in the properties of CGJ has
led at least some implementers to assume that it is ALWAYS ignored in
rendering (and so is not passed to the rendering engine).
-- Peter Kirk email@example.com (personal) firstname.lastname@example.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Mon Mar 29 2004 - 18:28:07 EST