Working Draft Proposal (2) for Encoding Emoji Symbols

L2/08-081
Date: 2008-01-28
Updated version of L2/07-257
Authors: Kat Momoi, Mark Davis, Markus Scherer

This is a working draft proposal for the Symbols Subcommittee and UTC. It is an update to L2/07-257.

In UTC meeting 112 we introduced a proposal (L2/07-257) for the encoding of Emoji symbols in The Unicode Standard. That proposal was discussed during the meeting, and the feedback was taken into account when developing this revised version. The ARIB symbols proposed in document L2/07-391 have been taken into account.


The proposal consists of two documents:
  1. This summary document.
  2. A draft table of proposed characters for encoding.

The proposal does not yet contain proposed code points. They could all go into a block in the supplementary planes, or some of them could be added to existing blocks of similar symbols in the BMP. We have no particular requirements for the positioning or ordering of these characters, although we have tried to group the characters in a reasonable fashion.

The proposed repertoire for encoding includes all of the symbols in the table, with the following exceptions. These exceptions are listed in the table for comparison only.

We are following the same basic principles as the UTC used when assessing the Japanese TV Symbols proposal. (See page 2 of L2/07-391.)

Background

This submission covers the Emoji symbols that are in widespread use by DoCoMo, KDDI and Softbank for their mobile phone networks, plus nine symbols defined by Google. These symbols are encoded in carrier-specific versions of Shift-JIS (as User-Defined Characters), and, in the case of KDDI, in a carrier-specific version of ISO-2022-JP. There are mapping tables in use in the industry between these character sets, with both roundtrip and fallback mappings. These symbols are also supported in web mail services by Yahoo! Mail and Google Mail. (Yahoo! Mail currently supports a subset.)

We took into consideration the following factors in coming up with this revised proposal:

  1. Source separation rule: If a single carrier separates two characters (anywhere in the character set, so including standard JIS codes), then we mapped them to two separate Unicode characters. (This is a hard and fast rule.)
  2. Reuse: We mapped to existing Unicode symbols where appropriate.
  3. Separating generic symbols: If Unicode had a set of related symbols, but no one character in the set was as generic as in the Emoji symbol sets, then we encoded a new character. For example, the Emoji sets do not distinguish between waxing and waning crescent moons.
  4. Colors and Animation: We encoded symbols as characters, abstracting away from colors and animation. We only distinguished by nominal color or animation for the source separation rule. (See naming below.)
  5. Existing cross-mapping tables: We followed the tables mentioned above as much as possible, but we tentatively disunified in some cases where the visual images were very different and not semantically associated. For example:
    1. We disunified the 'M' symbol for Metro from the Metro train image. The 'M' symbol would have translation problems. (This is similar to the problems with the international currency symbol and the proposal for a "generic decimal separator".)
    2. On the other hand, we unified the sets of Zodiac symbols, even though the images shown by carriers vary widely. This is because they clearly belong to a cohesive set which corresponds across carriers.
  6. Least-marked common symbol: For a set of symbols which each could map to an existing Unicode code point, we chose the symbol that was shared among the most carriers (according to the cross-mapping tables) and had the least-marked form.

Note: We tried to avoid disunification in Unicode where there are roundtrip mappings between carriers. However, where necessary, the disunification can be done. As the following diagram illustrates, roundtrip mappings between carrier Shift-JIS character sets can be maintained, by having the mapping tables between Unicode and each carrier's Shift-JIS version use appropriate fallback mappings.

KDDI

Unicode

Softbank
x

X

y
x

Y

y
x

y

Chart Legend

Columns:
  1. Representation
    1. Representative symbols. Note: These are colored and sometimes animated, but would be black and white in the code charts.
    2. Unicode code points where we propose unification with existing Unicode characters. The proposed repertoire for encoding excludes characters that can be unified.
    3. ARIB number from Japanese TV Symbols (L2/07-391). The proposed repertoire for encoding excludes characters that are proposed for encoding as part of the Japanese TV Symbols. (See Action Item [113-A12])
  2. Proposed Character Name
  3. Comments
  4. KDDI
    1. KDDI Icon
    2. KDDI Emoji catalog number
    3. KDDI Shift-JIS code
    4. KDDI ISO-2022-JP code
  5. DoCoMo
    1. DoCoMo Icon
    2. DoCoMo Emoji catalog number
    3. DoCoMo Shift-JIS code
  6. Softbank
    1. Softbank Icon
    2. Softbank Emoji catalog number
    3. Softbank Shift-JIS code
  7. G: Check marks for symbols defined by Google

Use of colors:

Symbol images are taken from reference materials from the carriers. Symbol images for the Unicode column use a Unicode character where one exists, and otherwise one of the carrier symbols. The latter would be replaced by an appropriate black-and-white representative glyph.

Proposed character names are tentative, typically based on the glosses of the carrier symbols or the visual appearance. We followed the conventions for existing Unicode characters where possible, in particular using "BLACK" for "filled" and "WHITE" for "hollow". We excluded nominal color and animation from proposed character names except where necessary for distinction.

Resources

The conversion tables:
Additional non-carrier referenecs:  For AU, DoCoMo and Softbank
See also: WAP Pictogram Specification approved Version 1.1 -- part of OMA Browsing V2.3 Enabler Specification