Re: Emoji: Public Review December 2008

From: Asmus Freytag (
Date: Sun Dec 21 2008 - 01:56:23 CST

On 12/20/2008 9:19 PM, Doug Ewell wrote:
> Asmus Freytag <asmusf at ix dot netcom dot com> wrote:
>> I'm surprised at how much of this discussion appears to be driven by
>> prior conviction and how many of the arguments that are being made
>> seem to become emotional. Many contributors seem to base their input
>> purely on a value judgment of what they deem appropriate types of text.
> My input has been based on:
> (a) where the WG2 "Principles and Procedures" document (N3452) says
> the line should be drawn,
> (b) where the Unicode Consortium and WG2 have drawn the line for the
> past 15 years, and
> (c) what the most respected authorities within UTC, including Asmus,
> have said for the past 10-plus years about where the line should be
> drawn.
The curious thing is that a, b and c are all evolving along with the
standard. It's a myth to think that there was ever a bright line that
was so clearly cut that you could evaluate a character's encodability as
if by plugging in a few attributes into a formula and getting a YES/NO

The "most respected authorities" would all agree that character encoding
- to a large degree - involves not black vs. white decisions, but more
often than not happens in a deep gray zone no man's land.
>> Architecturally, Unicode is designed to address plain text. Over
>> time, the shared understanding of what is plain text has evolved -
>> starting initially from the type of plain text seen in plain text
>> environment such as old-style e-mail, for example, and later being
>> expanded to encompass codes for the underlying text entities in
>> markup languages, even if they aren't fully usable outside of such
>> protocols. The sets of symbols for musical and mathematical notation
>> contain quite a number of characters that are only fully functional
>> when used with a full music composition system or mathematical layout
>> (such as MathML).
> and BLOOD TYPE A and ROASTED SWEET POTATO and POOP fit into the modern
> shared understanding of what is plain text.
Value judgment?
> I grant that there are many symbols in this collection that are
> communicative in nature, and would fit comfortably within Unicode.
> There are many others that are not, and would not. The decision to
> encode the entire set, the inappropriate ones as well as the
> appropriate, appears from here to have been based on the clout and
> prestige of the requesters, not the appropriateness of the symbols.
It's not what's appropriate to include in text, but what does and
doesn't fit the architectural model. (more below).
>> That emoji act functionally like plain text elements the way that
>> they fit into the architecture of numerous existing implementations
>> and that they are interchanged - about these facts there can be no
>> reasonable disagreement.
> Japanese cell-phone vendors are using these symbols as plain text
> characters. About this fact, there can be no reasonable disagreement.
> As to whether a symbol like ROASTED SWEET POTATO carries any
> communicative value, beyond being a picture of a roasted sweet potato,
> there can be plenty of disagreement.
Irrelevant. perhaps. We never question the mathematicians (or the
linguists, or the philologists) about the what they are putting in their
notational systems or whether the ancients should or should not have
used certain characters. I think that there can be no disagreement about
the fact that these have communicative value to their users.
> N3452 specifically mentions "pictures of cows" and "stop sign" as
> examples of symbols that should not be encoded. Naturally it is a bit
> of a surprise to see so much official and expert support behind the
> encoding of COW and TRAFFIC LIGHT.
Right. And as I wrote before, subject to change. Therefore, a future
revision of this document is likely to use different examples. The
Unicode Standard has contained language trying to define the scope. This
language has had to be changed over time, because the understanding of
what is and isn't plain text has evolved. It's still the case that one
doesn't need the catalog of street signs as Unicode, because nobody is
using this full set to communicate in text. The STOP sign is a different
matter - it's becoming something that I can definitely imagine being
used in interchange without literally being an encoding of a traffic sign.

The reason we don't see it more often in this manner is technological -
not fundamental.

The decisive difference in the case of the emoji is that some other
entities have created means for the interchange of such symbols as part
of (functional) plain text. Unlike the analysis of printed or written
matter, or projection of what might become needed (currency symbols),
this is a case where the plain text nature and usage is not up to
Unicode to decide - I suggest it exists and that fact can be observed

What makes me a bit sad in this context is that it took clout at all to
get people to overlook their prejudices and admit the facts in the case.

You see, there's a difference between somebody saying, in effect, it
would be cute to have a picture of a cow, so we could write about cows
using the picture, and a usergroup who already is off creating texts
with cows standing in for whatever they stand in for in their texts. In
one case, Unicode would standardize ahead of actual usage - and that's a
very dicey game, best avoided - and in the other case, it's trailing
actual usage, and dragging it's feet - and that's not good.
>> Pretending otherwise does not speak from the observable facts, but
>> rather appears based on prior convictions and value judgments of a
>> sort, which, I believe, have no place in the development the Unicode
>> Standard.
> This statement is highly subjective. I'm reading N3452. If that
> represents prior convictions and value judgments, then they are not
> mine, but those of WG2 -- which is admittedly not UTC. (I suppose it
> would be interesting to know what the official WG2 position is on
> emoji, and how WG2 reconciles support for emoji with support for N3452.)
The overall thrust of Unicode, and one of the major reason it won such
wide acceptance has been its focus on universality. (The other reasons
don't enter this discussion, except the equally strong resistance to
allow code extension mechanisms that are stateful - that's what killed
and put the stake through the heart of the original DIS of 10646, by the

Within this overall thrust, documents like N3452 are trying to document
the best current understanding of possible guidelines -- in an attempt
to make the process work better. However, these guidelines are amended
at nearly every meeting as the committee (and its liaisons) reach a
better understanding of their tasks.

The concern about the emoji characters is not driven by concerns for the
language of N3452, but based on value judgments about the entities.
Otherwise, people would take a more dispassionate attitude and reflect
that characters widely used as plain text in existing, commercially
viable implementations, are clearly subject to encoding by Unicode and
that they fit under the overall universalist mandate. Therefore, along
with encoding the characters, the guidelines have to be updated (again)
to match.
>> Suggestions like endorsing permanent private use code assignments or
>> inventing special, stateful, mini-markup for these characters, are
>> likewise driven by the desire to express a value judgment, and not by
>> careful analysis of the technical requirements. Some of these
>> suggestions were made by people whose sound technical judgments I had
>> come to trust. I will have to be more careful in the future: these
>> suggestions, if acted on, would do more harm to the Unicode Standard
>> than admitting even an unexamined set of symbol characters.
> Well, thank goodness I never suggested any of these.
I'm glad - because it was stateful code-shifting mechanisms that brought
down the house on the first DIS of 10646. Its many other warts helped,
but that was the deal breaker.
>> What is needed most, at this juncture, is not further opinionizing
>> about the value of these proposed characters, but the detailed work
>> of sorting them into the standard. There are enough hard questions to
>> be answered:
> So, in other words, the decision to encode the entire set has been
> made, and resistance is futile.
I'm not in the position to speak for the UTC, or even vote in the UTC.
But, yes, I think "resistance" is not helpful.
>> 1) Are there entities that can't be encoded for exceptional reasons?
>> 2) What are the semantic distinctions and range of semantics to be
>> encoded?
>> 3) What to do about semantic distinctions normally represented by
>> text styles?
>> 4) What to do about naming?
>> ...
>> To 3: Mathematical symbols assign semantic value to what would be
>> stylistic variation in other context. In principle, the same could be
>> applied to color distinctions for emoji. Some emoji codes would
>> require color for correct rendering, but color would otherwise remain
>> limited to markup. Such a solution would be entirely parallel to what
>> was done for mathematical alphabets -- however -- if there's a
>> possible mapping to a range of textures (black, white, lined,
>> hatched) that would be an acceptable way of handling the situation,
>> so as to be able to sidestep this issue entirely for now, and perhaps
>> for ever.
>> ...
>> Having said all this, why can't I find more of a discussion of
>> individual characters from the proposal, e.g. in the light of the
>> four questions I outlined above?
> Fine, here's a question related to item #3, and the individual
> characters to which item #3 relates:
> If the proposal is being made to establish some sort of cross-hatching
> scheme to represent colored images, similar to that used for heraldic
> tinctures, then what sort of scheme shall be used for animated images?
A default mapping of hatching to colors would be a great thing to
propose - it could come in handy in other situations.

> Several of the proposed images, especially those present only in the
> KDDI and SoftBank collections, are attested only as animations. How,
> for example, are we supposed to distinguish between CHICK and HATCHING
> CHICK unless our fonts and rendering engines (or printed pages)
> support animation?
I think "hatching chick" has a definite semantics that does not
*require* animation. I would be generally in favor of encoding as many
of these in a way that does not burden Unicode with encoding the
animation aspect as such. A hypothetical "Alarm clock ringing" could be
rendered with animation, but could also be shown with little partial
outlines to the side of the clock indicating (statically) that it's
moving (cartoon style). I would encourage UTC to attempt to document as
many symbols in such a way that actual animation becomes a glyph variant.

> --
> Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14
> ˆ

This archive was generated by hypermail 2.1.5 : Fri Jan 02 2009 - 15:33:07 CST