Emoji Encoding Proposal: Progress Report
Asmus Freytag to the Symbols SC list 2008-02-01:
This progress report includes a modified copy of the feedback document L2/08-106
which was discussed on 2008-02-05 during UTC #114.
Notes from the February meeting are marked in yellow italics (or a check mark for "ok")
, and progress report notes are marked in blue and prefixed with "Done:"
Feedback that needs further work or discussion has been moved to the Emoji Symbols: Open Issues document.
The updated table is at http://www.unicode.org/~scherer/emoji/table/emoji-20080812.html
1) daggers: Unicode has single dagger - I see no reason why these can't
be unified with the existing character, even though the double dagger in
Unicode is a stacked dagger, so this would introduce an alternate form
of "double" dagger.
Michael Everson 20080201:✓Done: This is now explicitly noted in the heading for Section 8 of the updated table.
The typographic daggers are not knives.
Markus Scherer 20080204:
Nor are they part of the proposal. Like everything else in section 8,
they are included only for reference, and are not part of the
2) repetitive symbols: Unicode has a precedence for encoding multiples
of basic characters as units, instead of insisting that they be
represented as a sequence of characters: ellipses, integrals, primes,
etc. I see no reason to insist that fast forward and reverse, and
similar symbols be encoded as sequences of triangles. These should have
been encoded as units long ago (when the eject symbol was added).
[Markus Scherer 20080204: Support for encoding KDDI 8, as opposed to using a pair of U+25B6. Confirmed with Asmus 20080205.]✓Done: No action taken. The current proposal is sufficient.
3) geometric symbols in color: if there's a simple red/green or
orange/blue color dichotomy, mapping that to black/white would seem to
be an appropriate solution (rather than mapping only green to black).
[Asmus Freytag 20080205:
"Geometric Symbols" are squares, triangles etc. Your sets are full of examples of red/green squares of different sizes and blue/orange triangles or other shapes.
You've mapped the green one in many cases to the black symbol in Unicode.
What I'm suggesting is that you decide whether green is "white" or "black" and then map the green/red pair to the white/black pair (or the black/white pair).
The same goes for blue/orange pairs.
In a quick look at these I have not found instances where you had all colors of the rainbow for the *same* symbol, so mapping the complementary color pairs
to the black/white pair in Unicode allows you to map many more of the geometrical symbols than you currently do.
My comment aims at a general strategy. Either you decide that the colors in the Emoji sets are in fact in pairs, and that this means you can map a color pair
to the black and white pair, or you decide not to.]
red/orange to WHITE
green/blue to BLACK
1. APPLE (red)/GREEN APPLE (green) --> WHITE APPLE/BLACK APPLE
2. HEAVY CHECK MARK (blue)/HEAVY CHECK MARK 2 (red) --> HEAVY CHECK MARK/ WHITE HEAVY CHECK MARK
3. BLACK CIRCLE/BLACK CIRCLE 2 (red/blue) --> WHITE CIRCLE/BLACK CIRCLE
4. MEDIUM BLACK CIRCLE/MEDIUM BLACK CIRCLE 2 (red/blue) --> MEDIUM WHITE CIRCLE/MEDIUM BLACK CIRCLE
5. BLACK STAR 2/BLACK STAR 3 (non-solid pulsating star(orange)/solid blue
glowing star) --> WHITE GLOWING STAR 2/BLACK GLOWING STAR
6. BLACK SQUARE/BLACK SQUARE 2 (orange/green extra large) --> WHITE SQUARE/BLACK SQUARE (orange/green extra large)
7. BLACK SMALL SQUARE/BLACK SMALL SQUARE 2 (orange/green) --> WHITE SMALL SQUARE/BLACK SMALL SQUARE (orange/green)
8. BLACK MEDIUM SMALL SQUARE/BLACK MEDIUM SMALL SQUARE 2 (orange/green) --> WHITE MEDIUM SMALL SQUARE/BLACK MEDIUM SMALL SQUARE
9. BLACK MEDIUM SQUARE/BLACK MEDIUM SQUARE 2 (orange/green) --> WHITE MEDIUM SQUARE/BLACK MEDIUM SQUARE (orange/green)
10. BLACK DIAMOND/BLACK DIAMOND 2 (orange/blue) --> WHITE DIAMOND/BLACK DIAMOND
11. BLACK SMALL DIAMOND/BLACK SMALL DIAMOND 2 (orange/blue) --> WHITE SMALL DIAMOND/BLACK SMALL DIAMOND (orange/blue)
Orange to WHITE
Blue to BLACK
Green to CHECKERED
12. BOOK 1/BOOK 2/BOOK 3 (Green/Blue/Orange colored closed books) and vertical book --> BLACK BOOK/WHITE BOOK and BOOK WITH HORIZONTAL/VERTICAL FILL
YELLOW to WHITE
BLUE to BLACK
GREEN to CHECKERED
PURPLE to STRIPED
13. HEART WITH BLUE/GREEN/YELLOW/PURPLE COLOR --> BLACK HEART/CHECKERED HEART/WHITE HEART/STRIPED HEART. Also added preferred color note in the Comments column.
4) curved arrows. It appears that the block of arrows at 2900 was not
considered for mapping. See U+2934 etc.
[Markus Scherer 20080204: KDDI 731=U+2934 and KDDI 732=U+2935?] <-- check to see if they are reflected in the table.✓Done: KDDI 731=U+2934 and KDDI 732=U+2935 were not reflected in the original mapping. They are now reflected.
4a) Other symbols with color dichotomy: Instead of coding a heavy check
mark 2 for the red alternate of a red/blue pair, it would be better to
code a white (outlined) form in Unicode and the consistently map one
color to the black and one to the white form, as for all the other
symbols with color dichotomy.
[Markus Scherer 20080204: Instead, change KDDI 132 to unify with U+2713 Check Mark.]Add new white heavy check markDone: HEAVY CHECK MARK 2 --> WHITE HEAVY CHECK MARK (Note: HEAVY CHECK MARK was left alone.)
6) My last comment assumes that the intent would be to identify (where
possible) the actual universally applicable abstract character, not
simply a mapping target for a particular element of these particular
set(s) of emoji. My rough estimate is that between 30-50% of the
characters from the cross mapping table, correspond to a reasonably
generic symbol. That's the one that should be identified and encoded in
those cases. For the remainder, the emoji sets are the primary if not
only environment where the symbol is used. In those cases I see no need
to shoehorn what is clearly a novel character into an identification
with any pre-existing symbols
[Markus Scherer 20080204: Agreed.]✓
Done: No action
Note: I'm not talking here about the small minority of cases were
there's an actual unification with and existing *character*. That's
something else again.
7) Blank spaces: These should be mapped to Unicode space characters:
full = 3000, half = 0020 and quarter to one of the narrow spaces.
✓Done: now the proposal table reflects these changes:
[Markus Scherer 20080204: U+3000 and U+0020 are already encoded in Shift-JIS (81 40 & 20). For source separation, the Emoji "blank spaces" need to be mapped to other code points.
We could unify
- KDDI 173 "completely blank"=U+2003 Em Space
- KDDI 174 "half blank"=U+2002 En Space
- KDDI 175 "one quarter blank"=U+2005 Four-Per-Em-Space.]
- KDDI 173 "completely blank"=U+2003 Em Space
- KDDI 174 "half blank"=U+2002 En Space
- KDDI 175 "one quarter blank"=U+2005 Four-Per-Em-Space
8) Recycling symbol: I'm concerned that this is a misidentification.
What I think the source symbols may intend is the "refresh" symbol, i.e.
a UI symbol from a browser context. This should not be unified with the
recycling symbol unless it can be established that recycling in the
sense of "recycling bin" is what this symbol is used for.
[Markus Scherer 20080204: Kat Momoi is Japanese, and he confirms that DoCoMo 342 (=KDDI 807) is unambiguously "recycling" in the sense of materials reuse.]✓ Done: No action needed
9) Parking: the generic image for parking is probably better a rectangle
or square with the letter P inscribed, rather than a circle (despite US
usage ;-) )
[Markus Scherer 20080204: KDDI 208. Agreed on the suggestion for the representative glyph.]✓Done: Replaced the docomo emoji with the KDDI's emoji, which shows a square enclosure. (For illustrative purpose only.)
George Rhoten to the Symbols SC list 2008-02-01:
Your proposal seems to include a lot of stuff that is [...] already in
Unicode without mentioning the appropriate Unicode codepoints [...]
There are several blank spaces. \u2000-\u200A come to mind. I find it
difficult to believe that the blank spaces can't be unified with the large
number of white spaces that already exist in Unicode.
[Markus Scherer 20080204: Yes, see response to Asmus' feedback above.]✓Done -- see above.
I don't claim this to be a comprehensive list of characters that I think
are already in Unicode, but these were my first impressions.
It would be helpful if the large table noted which symbols are not in the
proposal. For example, noting that section 8 is not in the proposal and
its for only reference. That would be helpful. I'm not sure if the flags
fall under the company logo policy of Unicode. There was a discussion
about the flags on the unicore list when the proposal first came out.
About keypad symbols:
Information: Kat Momoi: The usage of Keypad characters are mainly to select choices on a mobile phone screen. For you want see 1) a map, 2) transportation direction, 3) walking direction, 4), etc. Each one of these selections would correspond to a Keypad number.Done: mapped KEYPAD 0 - 9 to ASCII NUMBER + (U+20E3). Did not represent them as such on the 1st and 2nd columns of the table. Our script doesn't currently have a provision for combined characters yet. It will be modified in the very near future. Did not change KEYPAD 10 yet (see Open Issues doc).
- Use U+20E3 Combining Enclosing Keycap?
- Ken & Mark: Yes, but keycap "10" needs to be separate
- "10" + keycap does not work -- add "keycap-10" character?
- Note: There is U+2469 circled 10
- Make it a vanilla encircled 10 of some sort??
- Open question:
- Are these really (phone) keypads?
- Or for an adding machine?
- Or just decoration?
Done: Removed Google-invented Emoji symbols from the proposal and the table.Done:
- The following were unified with existing characters:
- RIGHTWARDS ARROW CURVED UPWARDS --> U+2934
- RIGHTWARDS ARROW CURVED DOWNWARDS --> U+2935