Re: What are the present criteria...

From: Asmus Freytag <>
Date: Thu, 18 Aug 2011 10:24:31 -0700

On 8/18/2011 7:29 AM, Doug Ewell wrote:
> Karl Pentzlin<karl dash pentzlin at acssoft dot de> wrote:
>> The quoted indicators for benefit were part of a concern of the German
>> NB regarding the Wingding/Webding proposals. The concern expressed in
>> WG2 N4085 is that some characters proposed there conform neither to
>> the policy statements by UTC or WG2, nor to the indicators of benefit
>> which the German NB would accept as an additional reason to encode
>> Wingding/Webding characters beyond the formal policies of UTC and WG2.
> Nevertheless, N4085 is a German NB document, the criteria in question
> are those suggested by the German NB and not WG2 (and the document makes
> note of this distinction), and it is an error to portray this passage as
> representing either a change or a lack of clarity in UTC or WG2 policy.
Karl makes no such claim. The document states that 2093-2096 appear to
be in violation of the character glyph model. I believe that's the
section (or one of the sections) in the document that Karl summarizes
here as "policy statements by UTC or WG2" - at least it would fit.

Anyway, it's more useful to focus on the actual concerns, not about
whether Karl summarized them correctly in his email.

The German NB introduces the concept of "indicator" of "benefit [to] the
user", and then defines that as:
- evidence of actual use
- evidence that it's likely a wrong character might be used for lack of
an encoded character
- conformance to other standards
(I've slightly rephrased for clarity).

I have several problems with this approach.

First, these "indicators" are rather haphazardly compiled. Overwhelming
evidence of plain text use, and conformance requirements are already
recognized as valid reasons to encode characters (not just symbols).
They do not, however, help in evaluating those proposals where more
nuanced judgement is required. The third element, that the wrong
character might be mistakenly used, is of overriding concern only in
particular cases where questions of unification or disambiguation need
to be decided.

Second, it's really unsatisfactory if each NB has their own criteria for
when to add characters to the standard, and it's especially unsettling
when such criteria seem to be "ad-hoc" applied to a given repertoire.
WG2 and Unicode have had lengthy discussions and broad consensus about
the kinds of criteria to take into account when encoding characters in
general or symbols in particular.

The result has been captured in a number of documents, for example,
here's the original one from the UTC: (with links to more
recent versions).

Unlike the list in N4085, the criteria adopted by UTC and WG2 are not
formulated as PASS / FAIL. Instead, they were carefully designed to be
used in assigning weight in favor or in disfavor of encoding a
particular symbol as a character. This recognizes an important
principle, which has been notably absent in much recent discussion: it
is generally not possible to create any set of criteria that can be
applied mechanistically (or algorithmically). The decision to encode a
character is and remains a judgement call. Some calls are easy, because
the evidence is overwhelming and direct, some calls are more difficult,
because the evidence may be uncertain or indirect, or the nature of the
proposed character may not be as well understood as one would ideally

Recognizing these inherent difficulties in the encoding work and the
need for a set of weighing factors instead of simplistic PASS / FAIL
criteria was one the early break-throughs in the work of WG2 and UTC.
Accordingly the documents speak not of criteria "whether" to encode
characters, but criteria that "strengthen (resp. weaken) the case for
encoding". That's a crucial difference.

While the details of these criteria (or factors) can and should be
evaluated from time to time for continued appropriateness, the soundness
of the general methodology is not in question, and UTC and WG2 should
resist any attempts (directly or indirectly) to abandon them in favor of
an unworkable, simplistic, and ad-hoc PASS / FAIL approach.

What are relevant criteria?

The document I cited lists the original set of criteria as follows

          What criteria strengthen the case for encoding?

    The symbol:

      * is typically used as part of computer applications (e.g. CAD
      * has well defined user community / usage
      * always occurs together with text or numbers (unit, currency,
      * must be searchable or indexable
      * is customarily used in tabular lists as shorthand for
        characteristics (e.g. check mark, maru etc.)
      * is part of a notational system
      * has well-defined semantics
      * has semantics that lend themselves to computer processing
      * completes a class of symbols already in the standard
      * is letterlike (i.e. should vary with the surrounding font style)

          What criteria weaken the case for encoding?

    There is evidence that:

      * the symbol is primarily used freestanding (traffic signs)
      * the notational system is not widely used on computers (dance
        notation, traffic signs)
      * the symbol is part of a set undergoing rapid changes
      * the symbol is trademarked (unless requested by the owner)
        (logos, Der grüne Punkt, CE symbol, UL symbol, etc)
      * is purely decorative
      * it’s ok to ignore its identity in processing
      * font shifting is the preferred access and the user community is
        happy with it (logos, etc.)

    Or, conversely, there is not enough evidence for its usage or its
    user community.

These criteria as originally formulated don't spell out how to evaluate
"widely used in plain text" and "required for compatibility with another
standard or for round-trip mapping", because the criteria were concerned
with issues that are specific to symbols. Wide usage and requirements of
compatibility usage apply to characters of any kind and tend to
short-circuit detailed evaluation of individual characteristics of
characters anyway.

Requirements for compatibility is the primary factor that should apply
to the characters 2093-2096 discussed in the German document. If one
agrees with the premise of encoding the Web/Wingding sets as
"compatibility sets" then the compatibility requirement covers all the
characters in them, just as other compatibility characters, already
encoded, they may violate some aspects of the character-glyph model.

Received on Thu Aug 18 2011 - 12:29:18 CDT

This archive was generated by hypermail 2.2.0 : Thu Aug 18 2011 - 12:29:18 CDT