Re: Emoji: emoticons vs. literacy

From: Doug Ewell (
Date: Sun Jan 04 2009 - 22:53:12 CST

  • Next message: John Hudson: "Re: Emoji and Search Engines"

    Asmus Freytag <asmusf at ix dot netcom dot com> wrote:

    > It's an attempt to separate the two facets of compatibility: One is
    > based on interoperability needs being the primary base for encoding
    > the character. The other is based on a character having a
    > compatibility decomposition. The latter are the ones that could be
    > called "compatibility variants", because they can be considered a
    > variant of an existing (ordinary) character.
    > (In discussions like this, I personally prefer the term "ordinary" in
    > place of the more cumbersome circumlocution "normal (that is,
    > non-compatibility)".)

    I don't see any definition of "compatibility character" in the TUS book
    that refers to this first facet, that is, a character that is
    *completely unrelated* to any other character in the standard but is
    encoded due to "interoperability needs." The entry for "compatibility
    character" in the Glossary is simply a truncation of the longer
    definition in Section 2.3, and in fact the Glossary entry directs the
    reader to the full description in Section 2.3.

    The whole purpose for calling emoji "compatibility characters" seems to
    be to exempt them from the normal stated guidelines of what is and is
    not a candidate for encoding.

    If you (Ken, Mark, anyone) can show me a definition of "compatibility
    character" that refers to this "interoperability needs" aspect and does
    not assume any relationship between these characters and "ordinary"
    characters, I would appreciate it. This has to be a definition long
    enough to stand on its own, not just a selective truncation of a longer
    text (so excluding the Glossary entry), and it has to be found somewhere
    within the Unicode 5.0 or 5.1 text, including standard annexes.

    If no such definition can be found, then I have to assume this
    "interoperability needs" argument was created solely for the purpose of
    admitting the emoji set.

    > It should be immediately obvious, that not all characters needed for
    > interoperability (compatibility) can be guaranteed to have an ordinary
    > character counterpart. Therefore, some characters that look like
    > ordinary characters (because they don't have a compatibility
    > decomposition) are in fact encoded for compatibility.

    It is not immediately obvious to me. Can you give some examples of
    currently encoded compatibility characters that have no ordinary
    character counterpart, but were encoded solely for compatibility with
    external, post-1993 standards?

    > The set of emoji (and also emoticons) are composed of many ordinary
    > characters (straightforward symbols), plus compatibility characters
    > that do not have a decomposition.

    Of the images in the "Table for Working Draft Proposal" that do not
    already have a Unicode code point, I don't think I see more than 20 or
    so that are composed of existing, ordinary characters. Practically all
    are new images.

    >> At least now when I see a black-and-white statement such as "Unicode
    >> does not encode idiosyncratic, personal, novel, or private-use
    >> characters, nor does it encode logos or graphics," I know how to
    >> interpret it.
    > Yes, "graphics" is not a very well-defined term ;-)

    As discussed over the past two weeks, there are some things like the
    letter "A" which clearly fall on one side of the text/graphics
    continuum, and other things like the Venus de Milo which clearly fall on
    the other side. There is a substantial gray area in between. I think
    we can agree on that. Now, guess which side I think CLINKING BEER MUGS
    belongs on.

    > And "novel" would have encompassed the Euro sign before 2002, yet it
    > was coded well in advance of the actual introduction of that currency.

    EURO SIGN is not an ideal example. It was well known and undisputed in
    1998 that this symbol would become ubiquitous and globally important
    within a few years. The restriction against novel characters was
    clearly and explicitly intended to exclude characters whose importance
    and/or staying power was unknown. (Principles and Procedures, section
    H.10: "The euro sign... is a novel symbol for which there is
    demonstrated and strong demand.")

    And even if EURO SIGN did break the rule against "novel" symbols, there
    was only one of them, not 618.

    >> I've been a huge and vocal supporter of the Unicode Standard for the
    >> past 16 years, back before most people had heard of it, and this is
    >> by far the most disappointed I have ever been in the Standard. This
    >> decision will come back to haunt Unicode again and again.
    > First, there hasn't been a decision. Certainly not a final one. So
    > it's a bit premature to express things this way.

    I and others have already been told, publicly and privately, to stop
    arguing against inclusion of the entire emoji set, because "'resistance'
    is not helpful" and "the decision to encode the emoji as individual code
    points does not need to be revisited." Doesn't sound to me like a
    particularly bumpy road to UTC approval. The real tough questions might
    have to come from member bodies in WG2.

    > Second, if you've been around that long, you might have heard about
    > similar discussions where people were predicting bad outcomes from
    > certain decisions. Surprisingly enough, things didn't always turn out
    > as badly as predicted. Some issues, after being hotly contested and
    > taking truly enormous bandwidth in the committee, and on the lists,
    > have sunk out of sight without a trace, the minute they were decided
    > (and seem to have had no observable impact on the standard).
    > Astonishing, but true.

    One of the better examples, I concede, was the encoding of the math
    alphabets, which did not (to some people's surprise) result in
    widespread use of these symbols for bold, italic, etc. markup in plain
    text. (My MathText application, which performed this kind of abuse, was
    an April Fool's parody.) Neither the approval nor the subsequent
    deprecation of the Plane 14 tags caused any lasting harm, though I still
    contend the deprecation solved nothing. The encoding of Phoenician
    separately from Square Hebrew does not appear to have ruined
    text-searching capability for Middle Eastern scholars. And even the old
    flames about CJK unification seem to have died down.

    But all of these issues, except arguably Plane 14, had something to do
    with characters in a writing system. Even mathematical formulas,
    two-dimensional though they may be, are still composed from what most
    people would call "writing." There are a great many images in the emoji
    set that have nothing whatsoever to do with writing, nor layout control,
    nor text meta-information, nor symbols with semantic value. They are
    cute little wiggling pictures of balloons and party poppers.

    None of the other issues required such revisionism of basic principles
    of Unicode and 10646. There is nothing in the 1400-page TUS 5.0 book
    that stretches the meaning of "compatibility characters" to encompass
    wiggling pictures of balloons and party poppers. That's a retrofit.

    > Third, I really hope that no single issue can affect your support for
    > the standard, if it's sustained you for 16 years so far.

    I can no longer tell myself or anyone else that such-and-so character or
    symbol is something that Unicode would or would not consider encoding.
    Everything is up in the air now.

    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14  ˆ

    This archive was generated by hypermail 2.1.5 : Sun Jan 04 2009 - 22:55:36 CST