Re: Emoji: Public Review December 2008

From: Asmus Freytag (
Date: Sat Dec 20 2008 - 19:14:43 CST

I'm surprised at how much of this discussion appears to be driven by
prior conviction and how many of the arguments that are being made seem
to become emotional. Many contributors seem to base their input purely
on a value judgment of what they deem appropriate types of text.

I think that the strength of Unicode has always been it's almost
single-minded focus on universality. Sure, there are limitations, but
they are based on how Unicode fits into the overall architecture of the
global computing environment, not on the nature of the text, or the
nature of the group of users.

Architecturally, Unicode is designed to address plain text. Over time,
the shared understanding of what is plain text has evolved - starting
initially from the type of plain text seen in plain text environment
such as old-style e-mail, for example, and later being expanded to
encompass codes for the underlying text entities in markup languages,
even if they aren't fully usable outside of such protocols. The sets of
symbols for musical and mathematical notation contain quite a number of
characters that are only fully functional when used with a full music
composition system or mathematical layout (such as MathML).

Nevertheless, the underlying elements are entities that can and should
properly be encoded as plain text elements, so that they can be treated
more uniformly inside the overall architecture. Sure, there were
pre-existing SGML entity sets for them, but it proved beneficial to use
Unicode to consistently encode the semantics of the entire range of
these symbols, rather than leaving some of them to entity sets (which
are limited to SGML-like environments). The benefits to implementers of
these markup language of having a single, consistent representation for
the entire textual "backbone" of a markup document is enormous.

That emoji act functionally like plain text elements the way that they
fit into the architecture of numerous existing implementations and that
they are interchanged - about these facts there can be no reasonable
disagreement. Pretending otherwise does not speak from the observable
facts, but rather appears based on prior convictions and value judgments
of a sort, which, I believe, have no place in the development the
Unicode Standard.

Suggestions like endorsing permanent private use code assignments or
inventing special, stateful, mini-markup for these characters, are
likewise driven by the desire to express a value judgment, and not by
careful analysis of the technical requirements. Some of these
suggestions were made by people whose sound technical judgments I had
come to trust. I will have to be more careful in the future: these
suggestions, if acted on, would do more harm to the Unicode Standard
than admitting even an unexamined set of symbol characters.

What is needed most, at this juncture, is not further opinionizing about
the value of these proposed characters, but the detailed work of sorting
them into the standard. There are enough hard questions to be answered:

1) Are there entities that can't be encoded for exceptional reasons?
2) What are the semantic distinctions and range of semantics to be encoded?
3) What to do about semantic distinctions normally represented by text
4) What to do about naming?

Some general observations about these questions.

To 1: Logos are so far the only exception that should be applied on
first principles. It is implicitly recognized that logos are plain text
in a technical sense, but that there are overriding concerns that don't
permit them to be treated as characters.

To 2: This requires a careful look into the nature of the proposed
characters. Some are presented as fairly generic embodiments of a
particular semantic (e.g. factory) when it is well known that in other
environments *different* symbols would be used. In such case it's
important that the name and annotations chosen reflect the fact that the
symbol to be encoded is *not* the most generic one, and perhaps is only
the generic symbol in the context relevant to the subset of emoji
symbols. Getting this wrong will not impact the technical aspects of
supporting Emoji in Unicode, but will make it difficult to correctly
support other sets of symbols at a later time.

To 3: Mathematical symbols assign semantic value to what would be
stylistic variation in other context. In principle, the same could be
applied to color distinctions for emoji. Some emoji codes would require
color for correct rendering, but color would otherwise remain limited to
markup. Such a solution would be entirely parallel to what was done for
mathematical alphabets -- however -- if there's a possible mapping to a
range of textures (black, white, lined, hatched) that would be an
acceptable way of handling the situation, so as to be able to sidestep
this issue entirely for now, and perhaps for ever.

To 4: Naming is always a subject where everyone has an opinion. Names
don't matter unless they appear to express constraints on usage or
glyphic representation of a character that don't exist. Naming symbols
based on conventional meaning can imply that they cannot be used for
more than one meaning - however, that can be addressed by annotations
and (informative) aliases. Naming based on graphical constituent parts
can be misleading if the symbols aren't really always constructed from
the same parts (and such naming is exceedingly cumbersome). Striving for
perfect names is less helpful than to avoid clear-cut blunders.

Having said all this, why can't I find more of a discussion of
individual characters from the proposal, e.g. in the light of the four
questions I outlined above?


This archive was generated by hypermail 2.1.5 : Fri Jan 02 2009 - 15:33:07 CST