Feedback on the Updated Emoji Encoding Proposal (=L2/08-081)

Date: 2008-02-05
Authors: Kat Momoi, Markus Scherer
Includes feedback from Asmus Freytag and George Rhoten
Proposal see http://www.unicode.org/L2/L2008/08081-emoji-wd.html

Responses marked with "Markus Scherer" are mostly from collaboration with Kat Momoi.

Asmus Freytag to the Symbols SC list 2008-02-01:
1) daggers: Unicode has single dagger - I see no reason why these can't
be unified with the existing character, even though the double dagger in
Unicode is a stacked dagger, so this would introduce an alternate form
of "double" dagger.

Michael Everson 20080201:
The typographic daggers are not knives.

Markus Scherer 20080204:
Nor are they part of the proposal. Like everything else in section 8,
they are included only for reference, and are not part of the

2) repetitive symbols: Unicode has a precedence for encoding multiples
of basic characters as units, instead of insisting that they be
represented as a sequence of characters: ellipses, integrals, primes,
etc. I see no reason to insist that fast forward and reverse, and
similar symbols be encoded as sequences of triangles. These should have
been encoded as units long ago (when the eject symbol was added).

[Markus Scherer 20080204: Support for encoding KDDI 8, as opposed to using a pair of U+25B6. Confirmed with Asmus 20080205.]

3) geometric symbols in color: if there's a simple red/green or
orange/blue color dichotomy, mapping that to black/white would seem to
be an appropriate solution (rather than mapping only green to black).

[Asmus Freytag 20080205:
"Geometric Symbols" are squares, triangles etc. Your sets are full of examples of red/green squares of different sizes and blue/orange triangles or other shapes.

You've mapped the green one in many cases to the black symbol in Unicode.

What I'm suggesting is that you decide whether green is "white" or "black" and then map the green/red pair to the white/black pair (or the black/white pair).

The same goes for blue/orange pairs.

In a quick look at these I have not found instances where you had all colors of the rainbow for the *same* symbol, so mapping the complementary color pairs
to the black/white pair in Unicode allows you to map many more of the geometrical symbols than you currently do.

My comment aims at a general strategy. Either you decide that the colors in the Emoji sets are in fact in pairs, and that this means you can map a color pair
to the black and white pair, or you decide not to.]

Also, 5.1 introduces several additional geometric symbols, which could
be mapped, for example the extra large squares.

[Asmus Freytag 20080205:
You also don't appear to have looked at block 2B00 from Unicode 5.1 (which among other symbols should contain an additional pair of squares, or the
2980 block from Unicode 5.0 (curved arrows).
Doing so, will yield a few additional mappings.]

4) curved arrows. It appears that the block of arrows at 2900 was not
considered for mapping. See U+2934 etc.

[Markus Scherer 20080204: KDDI 731=U+2934 and KDDI 732=U+2935?]

4) Other symbols with color dichotomy: Instead of coding a heavy check
mark 2 for the red alternate of a red/blue pair, it would be better to
code a white (outlined) form in Unicode and the consistently map one
color to the black and one to the white form, as for all the other
symbols with color dichotomy.

[Markus Scherer 20080204: Instead, change KDDI 132 to unify with U+2713 Check Mark.]

5) Clock faces, computer/document icons, as well as a rather significant
number of other symbols are present in the suite of wingdings fonts
distributed by Microsoft. A cross mapping to these would be a useful
exercise - not the least because these fonts represent existing black
and white interpretations of the glyph shape(s) for such symbols. These
glyphs might represent possible starting points for representative
glyphs, should these characters be encoded.

[Markus Scherer 20080204: Good suggestion, but not immediately necessary for the discussion of encoding. We will try to cross-map with Wingdings after UTC #114. Volunteers for cross-mapping with Wingdings would be appreciated.]

6) My last comment assumes that the intent would be to identify (where
possible) the actual universally applicable abstract character, not
simply a mapping target for a particular element of these particular
set(s) of emoji. My rough estimate is that between 30-50% of the
characters from the cross mapping table, correspond to a reasonably
generic symbol. That's the one that should be identified and encoded in
those cases. For the remainder, the emoji sets are the primary if not
only environment where the symbol is used. In those cases I see no need
to shoehorn what is clearly a novel character into an identification
with any pre-existing symbols

[Markus Scherer 20080204: Agreed.]

Note: I'm not talking here about the small minority of cases were
there's an actual unification with and existing *character*. That's
something else again.

7) Blank spaces: These should be mapped to Unicode space characters:
full = 3000, half = 0020 and quarter to one of the narrow spaces.

[Markus Scherer 20080204: U+3000 and U+0020 are already encoded in Shift-JIS (81 40 & 20). For source separation, the Emoji "blank spaces" need to be mapped to other code points.
We could unify

8) Recycling symbol: I'm concerned that this is a misidentification.
What I think the source symbols may intend is the "refresh" symbol, i.e.
a UI symbol from a browser context. This should not be unified with the
recycling symbol unless it can be established that recycling in the
sense of "recycling bin" is what this symbol is used for.

[Markus Scherer 20080204: Kat Momoi is Japanese, and he confirms that DoCoMo 342 (=KDDI 807) is unambiguously "recycling" in the sense of materials reuse.]

9) Parking: the generic image for parking is probably better a rectangle
or square with the letter P inscribed, rather than a circle (despite US
usage ;-) )

[Markus Scherer 20080204: KDDI 208. Agreed on the suggestion for the representative glyph.]

George Rhoten to the Symbols SC list 2008-02-01:
Your proposal seems to include a lot of stuff that is [...] already in
Unicode without mentioning the appropriate Unicode codepoints [...]

The obvious ones, at least to me, are the characters with an enclosing
circle or box.  Plenty of these can be represented with the character
followed by \u20DD or \u20DE.  The keypad 0-9, parking sign (\u24C5), and
several other letter signs come to mind as already existing in Unicode.
There are several circled ones in the \u2460-\u24FF block.

[Markus Scherer 20080204:

Using \u20DD or \u20DE would be similar to what happened with JISX 213.
For example, the mapping for \u00E6\u0300.

The dagger is already represented by character \u2020.

[Markus Scherer 20080204: The dagger is not part of the proposal.]

The wavy line could be \u3030, \uFE4B or \uFE4F.  Maybe there are others.

[Markus Scherer 20080204: DoCoMo 165 & 166 are used like decorative version of U+30FC Prolonged Sound Mark. On reflection, they should be Modifier Letters, not Symbols, therefore DoCoMo 165 Wavy Length Mark is not appropriate for unification with the other wavy dashes etc.]

There are several blank spaces.  \u2000-\u200A come to mind.  I find it
difficult to believe that the blank spaces can't be unified with the large
number of white spaces that already exist in Unicode.

[Markus Scherer 20080204: Yes, see response to Asmus' feedback above.]

I don't claim this to be a comprehensive list of characters that I think
are already in Unicode, but these were my first impressions.

It would be helpful if the large table noted which symbols are not in the
proposal.  For example, noting that section 8 is not in the proposal and
its for only reference.  That would be helpful.  I'm not sure if the flags
fall under the company logo policy of Unicode.  There was a discussion
about the flags on the unicore list when the proposal first came out.