Working Draft Proposal for Encoding Emoji Symbols

L2/07-257
Date: 2007-08-03
Authors: Kat Momoi, Mark Davis, Markus Scherer

This is a working draft proposal for the Symbols Subcommittee and UTC. We were not able to finish this draft for review by the committee before the start of the UTC meeting. However, the discussion of this working draft proposal at the UTC is to be on Thursday, 2007-Aug-09, so comments by then are invited.


The proposal consists of two documents:
  1. This summary document.
  2. A draft table of correspondences.

This is a preliminary proposal, for discussion. The proposal does not yet contain proposed code points since that would be premature. We have not had time to review all of the characters in detail. There may be spelling errors and other mistakes. In particular, we would appreciate special attention paid to symbols for faces, books and arrows.

Background

This submission covers the Emoji symbols that are in widespread use by DoCoMo, KDDI and Softbank for their mobile phone networks. These symbols are encoded in carrier-specific versions of Shift-JIS (as User-Defined Characters). There are mapping tables between these character sets, with both roundtrip and fallback mappings.

We took into consideration the following factors in coming up with this working draft:

  1. Source separation rule: If a single carrier separates two characters (anywhere in the character set, so including standard JIS codes), then we mapped them to two separate Unicode characters. (This is a hard and fast rule.)
  2. Reuse: We mapped to existing Unicode symbols where appropriate.
  3. Separating generic symbols: If Unicode had a set of related symbols, but no one character in the set was as generic as in the Emoji symbol sets, then we encoded a new character. For example, the Emoji sets do not distinguish between waxing and waning crescent moons.
  4. Colors and Animation: We encoded symbols as characters, abstracting away from colors and animation. We only distinguished by nominal color or animation for the source separation rule. (See naming below.)
  5. Existing cross-mapping tables: We followed the tables mentioned above as much as possible, but we tentatively disunified in some cases where the visual images were very different and not semantically associated. For example:
    1. We disunified the 'M' symbol for Metro from the Metro train image. The 'M' symbol would have translation problems. (This is similar to the problems with the international currency symbol and the proposal for a "generic decimal separator".)
    2. On the other hand, we unified the sets of Zodiac symbols, even though the images shown by carriers vary widely. This is because they clearly belong to a cohesive set which corresponds across carriers.
  6. Least-marked common symbol: For a set of symbols which each could map to an existing Unicode code point, we chose the symbol that was shared among the most carriers (according to the cross-mapping tables) and had the least-marked form.

Note: We tried to avoid disunification in Unicode where there are roundtrip mappings between carriers. However, where necessary, the disunification can be done. As the following diagram illustrates, roundtrip mappings between carrier Shift-JIS character sets can be maintained, by having the mapping tables between Unicode and each carrier's Shift-JIS version use appropriate fallback mappings.

KDDI

Unicode

Softbank
x

X

y
x

Y

y
x

y

Chart Legend

Use of colors:

Symbol images are taken from reference materials from the carriers. Symbol images for the Unicode column use a Unicode character where one exists, and otherwise one of the carrier symbols. The latter would be replaced by an appropriate black-and-white representative glyph.

Proposed character names are tentative, typically based on the glosses of the carrier symbols or the visual appearance. We followed the conventions for existing Unicode characters where possible, in particular using "BLACK" for "filled" and "WHITE" for "hollow". We excluded nominal color and animation from proposed character names except where necessary for distinction.

Resources

The conversion tables:
Additional non-carrier referenecs:  For AU, DoCoMo and Softbank
See also: WAP Pictogram Specification approved Version 1.1 -- part of OMA Browsing V2.3 Enabler Specification