Re: Emoji: emoticons vs. literacy

From: James Kass (
Date: Fri Dec 26 2008 - 23:29:32 CST

Asmus Freytag wrote,

>On 12/26/2008 5:55 AM, Doug Ewell wrote:
>> Christopher Fynn <cfynn at gmx dot net> wrote:
>>> If carriers start using Unicode instead of Shift JIS there are all
>>> kinds of currently "unused" characters available for them to abuse ~
>>> or they could come up with several different PUA encodings - and then
>>> later come up with a proposal to standardise these using non-PUA
>>> characters with the same argument of "interoperability" found in this
>>> proposal.
>> Isn't that exactly what happened with the current unified emoji
>> repertoire? The three vendors encoded their (different) sets of
>> pictures in different ranges of the Shift-JIS user-defined area, then
>> looked to Unicode to unify the three sets in a common range.
>One of the most important principles on which the Unicode effort was
>founded was to provide a unified encoding, to finally have one single
>representation for a character or symbol, instead of multiple competing
>character sets, all with different codes for the same item. In providing
>a unified encoding for items that exist (and are widely used) in
>fragmented character sets, Unicode is fulfilling one of its core
>missions. By not arbitrarily denying the needs of its users to have a
>unified representation of this new phenomenon, emoji, Unicode is
>redeeming a key promise to its implementers.

Unicode denies those emoji which are based on logos. The logos in
the charts are already stricken.

>> I think it should be clear that there is a significant body of
>> resistance to encoding these images in Unicode, although Asmus and
>> Mark and Ken (among others) are on board with them and that is
>> probably all it will take to get them encoded. They are a major
>> compromise to the basic principles that have guided Unicode since its
>> inception, in terms of what does and does not belong in a character
>> encoding standard. They establish a new principle, that a group of
>> 800-pound corporate gorillas can override the precedent of 15+ years
>> in determining what gets encoded.
>Well, I'm neither a gorilla, nor that much overweight, but thanks in
>your confidence that my opinion still matters ;-)
>I disagree, fundamentally, with the charge you are trying to lay the the
>UTC's doorstep. I think that is not deserved. It may be based on a
>misunderstanding of the fundamental nature of the Unicode project and
>the revolution in character encoding it initiated.

Our understanding of the fundamental nature of the Unicode project
is based on the text in the Unicode books. If something is said to be
based on a misunderstanding, then the text is poorly understood,
or it is poorly written, or there is no such misunderstanding.

>By aiming for a universal character set, Unicode is exposed to different
>constraints than designers of special-purpose character sets. In
>essence, a universal character set, in order to be universal, has to
>model the world (of character code usage). Unfortunately, that includes
>not only the high points of writing systems for modern and classical
>civilization, but also the warts. Because of that, it's impossible to be
>simultaneously in full control of what's considered an encodable
>character and achieve universal coverage.

I disagree that we have to be out-of-control in order to be universal.

This is a universal *standard*, which implies full control. We don't
see phrases such as "c u l8r" enshrined in dictionaries of standard
English. That's because *those* standardizers rightfully reject such
cruft, in spite of common usage by millions of people, thus retaining
essential control.

>If users persist to treat as characters something that you think should
>not be a character, you have only two choices: extend your definition of
>character, or stop being universal.

There are always alternatives (as you explore in your post script).
Educating such users is another one.

Allowing users who have little or no conception of the technical
definition of "character" to determine "character-hood" degrades
the principles already established.

Visiting several web pages which explain how emoticons are used
shows us that the users and the companies offering these 'cute
little pictures' consider them to be just that: 'cute little pictures',
nothing more and nothing less.

Allowing icons and related signage into a plain text computer
encoding standard because multiple vendors have reportedly
complained about interoperability problems doesn't cinch the
argument. Standard interoperability is already denied by
denying the logos. We'd still like to see both of those problem
reports in order to determine the nature and severity of the
problem. Some of the brilliant people on this list may be able
to offer alternative solutions.

>> And I really don't want to hear again that the arguments against
>> encoding emoji are emotional and hysterical and opinionated, while the
>> arguments in favor of emoji are based on sound, logical reasoning.
>> There are facts and opinions on both sides.
>I fully expect sound, logical reasoning to understand and integrate the
>constraints I've outlined here. Anything that doesn't, is indeed
>opinionated, and perhaps even hysterical.
>PS: Before I'm misunderstood - in terms of proposed characters, there is
>occasionally a third choice. A proposed character may be already
>encoded, or it may be possible to represent it with character sequences.

Tamil ligatures were rejected because they can already be expressed
using sequences of Unicode characters. (Even though 8-bit fonts and
some modern PUA schemes treat them as single characters/single
code points.) So we reject millions of Tamil users who are exchanging
real text while embracing Japanese phone companies who are exchanging
cute little pictures. (I do not favor separate encoding of Tamil ligatures.)

Likewise, all of these icons can be represented using Unicode plain-text
in the form of commonly used punctuation strings. Need more icons?
Make up more punctuation strings or use mark-up.

(To some, common usage makes something proper. To others, common
usage is only considered vulgar.)

>Sometimes, it may be something that can't be a character and must be
>handled elsewhere in the architecture.

Like now. Nobody is arguing against encoding symbols which have
currency *as symbols*. The fact that many vendors use pictures
of existing symbols as icons and exchange them as though they were
text while rejecting already extant valid Unicode characters (as in
the astrological symbols) should be (and is) instructive.

> None of these apply in the
>current discussion - the emoji are eminently supportable as characters,
>as is daily proven by millions of working implementations, and they are
>not duplicates or variants that can be unified with other characters.

*Anything* is eminently supportable as characters if there is a
private agreement to do so. Letters, digits, symbols, icons, signage,
logos, sounds, music, movies... Any of those would take less
band-width if we all agreed to standardized code points for them.

What these vendors have done is to make up a PUA scheme for
exchanging rich-text. There's nothing wrong with that, that's
what the PUA is *for* (among other things).

These vendors assigned their graphic icons into conflicting user
defined areas of JIS. They solved their own interoperability
"problem" by making a consistent mapping into the Private
Use Area of Unicode. The "problem" is already solved, the
vendors are using the PUA of Unicode properly.

The experiment recently conducted on this list to attempt to
exchange "emoji" using the PUA was a smashing success. The
PUA code points arrived intact and could be returned intact.
(Unless, of course, one is stuck with some kind of Unix platform
which is locked into 8859-01. Then the characters get munged.)

Best regards,

James Kass

This archive was generated by hypermail 2.1.5 : Fri Jan 02 2009 - 15:33:07 CST