Re: more dingbats in plain text

From: Asmus Freytag (
Date: Fri Apr 17 2009 - 14:38:07 CDT

  • Next message: Doug Ewell: "Re: Handling of Surrogates"

    On 4/17/2009 8:33 AM, Johannes Bergerhausen wrote:
    > I would like to say that these symbols (the first version is from
    > 1993) are used worldwide on the same line of text mixed with other
    > characters like latin.
    > Another example to attest that there is a need to put international
    > symbols like public signage in the UCS.

    if you can show that a symbol is being used inline, that would satisfy
    *one* of several criteria that should be met in order to encode it as a
    character in Unicode.

    There is some disagreement in the character coding community of which
    other criteria need to be satisfied to proceed with encoding of any new

    If the symbol is part of a recognized *notation* then there seems to be
    widespread agreement that it should be encoded.

    If the symbol is already encoded in another character set, then it
    should be encoded *as long* as there's agreement that the other
    character set needs to be supported for compatibility.

    Beyond that, things are more difficult.

    One group of people firmly believes that if a symbol has been used with
    special fonts in rich text, that's proof that anyone needing this symbol
    already has a means to use it, and there's no need for encoding it. As
    you can easily surmise, this position, if taken to an extreme, disallows
    the encoding of any such symbols.

    Other people disagree - they feel it should be possible to search for
    symbols (as well as to use symbols in plain text). If you apply the
    second position to its fullest, you might want to encode all symbols
    ever used inline.

    My position is somewhat in the middle: I think that there are some
    symbols (more than currently encoded) that occur rather frequently. I
    call them "common" symbols. They usually have a well-defined
    appearance, which makes them highly recognizable, but they can often be
    used in a variety of contexts and with a variety of meanings, given by

    Because of their nature, they are highly versatile and useful - I would
    not hesitate to predict that they would end up being more often used
    than many of the rare or historic characters among the scripts in
    Unicode. That potential for widespread (for symbols) usage makes them
    attractive for standardization in my view.

    There are many sets of symbols of specialized nature, some extremely
    rigidly defined, for example the ISO set of warning signs (occupational
    hazards and the like). Despite their precise definitions, such sets (as
    a whole) would make poor targets for standardization as characters. The
    reason is that, by their nature, most of the symbols in these sets are
    very highly specialized, and therefore occur rarely, if at all, in
    inline text. However, many such specialized sets contain one or two, or
    a few, widely known and used symbols.

    To standardize anything represents a cost. For rare characters, such
    cost are in poor relation to the benefits - just as Unicode started out
    with encoding the widely used scripts first, widely used symbols should
    be encoded first - even if that means one has to provide an arbitrary
    cutoff that separates the common from the uncommon symbols *within* each
    category or set of symbols.

    Example: most traffic symbols like DEER CROSSING or SPEED LIMIT 30
    should probably not be encoded as characters. The STOP sign or the
    European CAUTION sign, however, are examples of common symbols, that
    deserve status as characters. You find them as part of texts where they
    retain their customary shape, but don't refer to traffic, but are used
    in a generalized sense. Hence, they have become _common_ symbols.

    Having encoded the _common symbols_ from a set of symbols, it's a
    fallacy to think that this then requires to also encode all the other
    symbols from that set, no matter how specialized. That's different from
    encoding scripts.

    The current Japanese-oriented additions (ARIB, Emoji) have added or will
    add many such common symbols. We've since learned that the technology
    that makes those symbols available for inline messages, is spreading to
    outside Japan.

    Therefore, what would be most useful in looking to "attest" symbol
    characters as you call it, would be to categorize the missing _common_
    symbols that relate to European (and other non-Japanese) usage.

    It's not sufficient to just point at sets of symbols for that - you also
    need to isolate which ones are _common_ symbols in each set, according
    to the definition of this concept that I've proposed here.

    I keep hoping that someone with the resources, time and interest will
    take on that project.


    This archive was generated by hypermail 2.1.5 : Fri Apr 17 2009 - 14:40:32 CDT