Re: Emoji: emoticons vs. literacy

From: Michael D'Errico (
Date: Fri Jan 09 2009 - 01:38:56 CST

  • Next message: Adam Twardoch: "Re: Emoji: emoticons vs. literacy"

    > Your suggestion, Michael, is to modify how the Unicode standard works in
    > order to encode emoji and similar non-text content in a flexible and
    > extensible way. My suggestion is that this content belongs in a
    > different standard altogether, one that is focused on non-text content.

    I've thought about this. But since you would want to intermix text
    and non-text, it makes sense to retain Unicode as a subset and use
    the same UTF encoding schemes. The problem, though, is that Unicode
    claims all the code points, so a new standard would have to violate
    the rules, either by using planes that Unicode will probably never
    use(*), or by going beyond plane 16 (which is impossible with UTF-16
    and specifically disallowed for UTF-8 and UTF-32 conformance).

    Personally, I would choose the latter approach and just say that you
    can't use UTF-16. UTF-8, even limited to 4 bytes, can encode a total
    of 32 planes, so there would be lots of initial room. Expanding it
    to 6 bytes as it was originally specified handles 32k planes.

    The problem with moving beyond the reach of UTF-16 is that some
    programming languages designed their String classes to hold UTF-16
    code points, and would therefore not be able to access the non-text
    content. This is probably the biggest roadblock to a solution
    outside of Unicode, and means that either Unicode would have to give
    up some of its code space to a new standard, or embrace the ideas
    and make it a part of Unicode.

    Well I won't be holding my breath....


    *Whistler's Conjecture states that no characters will ever be encoded
    beyond plane 2.

    This archive was generated by hypermail 2.1.5 : Fri Jan 09 2009 - 01:42:26 CST