Re: more dingbats in plain text

From: Asmus Freytag (
Date: Fri Apr 17 2009 - 18:56:35 CDT

  • Next message: Mark Davis: "Re: Handling of Surrogates"

    On 4/17/2009 4:30 PM, Doug Ewell wrote:
    > Asmus Freytag <asmusf at ix dot netcom dot com> wrote:
    >> Example: most traffic symbols like DEER CROSSING or SPEED LIMIT 30
    >> should probably not be encoded as characters. The STOP sign or the
    >> European CAUTION sign, however, are examples of common symbols, that
    >> deserve status as characters. You find them as part of texts where
    >> they retain their customary shape, but don't refer to traffic, but
    >> are used in a generalized sense. Hence, they have become _common_
    >> symbols.
    > The stop sign, like "pictures of cows," is another canonical example
    > presented in the WG2 "Principles and Procedures" document (updated
    > less than a year ago) of what should *not* be encoded. It's
    > interesting to see further evidence of how loosely the principles are
    > applied, in spite of all the protests that UTC is following the same
    > principles in encoding emoji that it followed two decades ago.
    If that kind of thing amuses you, try reading the introduction to the
    Unicode Standard. The early versions boldly proclaim many things off
    limits that later happened. From 32-bit character codes to Musical symbols.

    I don't see this as problematic. Many of these changes are the direct
    consequence of Unicode's success. Rather than shoehorn everybody and
    everything mercilessly into the 1988 view of what a global, universal
    character set should be, the developers of the standard have wisely
    adapted to critical needs and allowed the standard to reflect the
    experience gained in developing and implementing it. That's an
    unquestioned strength of the Unicode Standard.

    In that process, the principles have acted and continue to act as
    valuable guide posts. Ideally, all coding problems and needs can be
    covered within the boundaries demarcated by them. When that's not
    possible, a critical and thorough evaluation is performed that looks at
    whether a problem is important enough to address at all, but also
    whether it should give rise to an exception, or to a reformulation of
    the principles.

    For those areas where users and implementers MUST be able to rely on
    enforceable restrictions, you have the Unicode Stability Guarantees.
    There you have critical rules that MUST NOT be violated by changes in
    the standard. But there's a reason that principles and stability
    guarantees are not one and the same thing.
    >> It's not sufficient to just point at sets of symbols for that - you
    >> also need to isolate which ones are _common_ symbols in each set,
    >> according to the definition of this concept that I've proposed here.
    > Unless they can be defined as "compatibility characters," in which
    > case all of them must be encoded without question.
    Unless the set has been approved as a compatibility character set, in
    which case, the goal is, indeed, to cover it in full.
    (That decision does not rest with the proposer, no matter how much you
    would like to insinuate it.)

    The "sets of symbols" I was addressing in that part of my message,
    however, did not include compatibility character sets, but sets organized
    by category or type of symbol, like ISO safety symbols, UI symbols, etc.


    PS: I've removed the Emoji list from the cc, since this discussion did
    not get started there, nor is it specific about the emoji proposals.

    This archive was generated by hypermail 2.1.5 : Fri Apr 17 2009 - 18:58:50 CDT