Re: Emoji: emoticons vs. literacy

From: Michael D'Errico (
Date: Thu Jan 08 2009 - 12:47:21 CST

  • Next message: Hans Aberg: "Hand characters (was: +1/-1 in e-mail comments (was: Emoji: emoticons vs. literacy))"

    >> The short answer is that *everyone* benefits from having
    >> a standard that promotes interoperability of text interchange
    >> globally without data corruption.
    > ... is this about _text_ interchange, and specially plain text?

    The limitation of Unicode to plain text is actually just a policy.
    The emoji may not be text, but they do communicate an idea. Unicode
    should be about enabling communication, not just that communication
    which happens to use fonts. (Note I'm not saying Unicode should be
    used for all forms of communication, but text-ness should not be an
    absolute requirement.)

    Unicode is two orthogonal things: a technology for representing a
    series of numbers (all the UTF's), and a partial mapping of number
    to plain-text character. The fact that all assigned numbers so far
    are plain-text characters is the result of the original design goal
    for Unicode, which as you know was to create a universal character
    set to subsume all others.

    The private use areas in Unicode present a problem in that there is
    absolutely no limitation placed on what those numbers can represent
    in an application. If, as in the case of the emoji, these numbers
    (code points) start leaking out of an application, the UTC is faced
    with either saying, "not my problem" or with encoding them. The
    long-standing policy of only encoding plain text characters is at
    odds with the fact that the PUA does not need to be used strictly
    for plain text. It has been entertaining to see the calisthenics
    required to justify the emoji as plain text.

    This is a general problem that needs a solution. As Unicode gains
    acceptance and people start realizing all the neat things they can
    do with the PUA, the UTC will find itself turning many away simply
    because they used the PUA in a non-text way. "Use XML" is not the
    standard response I hope to see. I'd prefer that the UTC provide
    guidance on how to use the PUA in such a way that facilitates the
    move from PUA to Unicode proper. I've outlined one possible way to
    do it, but would love to see any other ideas.


    This archive was generated by hypermail 2.1.5 : Thu Jan 08 2009 - 12:49:57 CST