Re: Emoji: emoticons vs. literacy

From: Michael D'Errico (mike-list@pobox.com)
Date: Fri Jan 09 2009 - 16:30:44 CST

  • Next message: André Szabolcs Szelp: "Re: Emoji and cell phone character sets..."

    >> I've thought about this. But since you would want to intermix text
    >> and non-text, it makes sense to retain Unicode as a subset and use
    >> the same UTF encoding schemes. The problem, though, is that Unicode
    >> claims all the code points, so a new standard would have to violate
    >> the rules, either by using planes that Unicode will probably never
    >> use(*), or by going beyond plane 16 (which is impossible with UTF-16
    >> and specifically disallowed for UTF-8 and UTF-32 conformance).
    >
    > So you got back to the original problem, and just realized that
    > Unicode cannot save the world, and you just can't use one single
    > encoding to represent any kind of data, since different data requires
    > different binary representation based on its characteristics, at least
    > if our goal is efficiency.

    No, I didn't realize that. What I realized is that Unicode is in
    effect hoarding all of the possible UTF-16 code points even though
    it will never need or use planes 4, 5, 6, 7, 8, 9, A, B, C, or D.
    Unicode also slams the door on an extension standard that utilizes
    planes 17 and above since it is non-conformant to allow UTF-8 or
    UTF-32 to address the code points beyond plane 16. In addition,
    if you decide to be non-conformant for UTF-8, you will run into
    the limitation of many programming languages that use UTF-16
    internally and can't even access plane 17 or higher.

    So, really, the answer is that this has to be done in the unused
    Unicode planes, at least until programming languages migrate to
    UTF-8 internally. Again, I'm not going to hold my breath.

    Mike



    This archive was generated by hypermail 2.1.5 : Fri Jan 09 2009 - 16:33:31 CST