Emoji and Dingbats
Q: What are emoji?
A: Emoji are “picture characters” most frequently associated with cellular telephone usage in Japan, but also used in other East Asian countries and in other contexts. The word emoji comes from the Japanese
絵 (e ≅ picture)
文 (mo ≅ writing)
字 (ji ≅ character). Emoji on smartphones and in chat and email applications have become quite popular worldwide.
Emoji are often pictographs—images of things such as faces, weather, vehicles and buildings, food and drink, animals and plants—or icons that represent emotions, feelings, or activities. In cellular phone usage, many emoji characters are presented in color (sometimes as a multicolor image), and some are presented in animated form, usually as a repeating sequence of two to four images—for example, a pulsing red heart. [PE]
Q: Are emoji the same thing as emoticons?
A: Not exactly. Emoticons (from “emotion” plus “icon”) are specifically intended to depict facial expression or body posture as a way of conveying emotion or attitude in e-mail and text messages. They originated as ASCII character combinations such as :-) to indicate a smile—and by extension, a joke—and :-( to indicate a frown. In East Asia, a number of more elaborate sequences have been developed, such as (")(-_-)(") showing an upset face with hands raised. Over time, many systems began replacing such sequences with images, and also began providing ways to input emoticon images directly, such as a menu or palette. The emoji sets used by Japanese cell phone carriers contain a large number of characters for emoticon images, along with many other non-emoticon emoji.[PE]
Q: What are the most popular emoji characters?
A. The emojitracker.com tracks the realtime use of many emoji in Twitter, so you can see the most and least used emoji characters there.
Q: Can you point me to some examples of emoji characters in Unicode?
A: The emoji are spread throughout many blocks of Unicode. A good sample can be found in Miscellaneous Symbols And Pictographs. Notice that in these charts emoji are shown in black and white, whereas they typically appear in color on mobile phones and computers.
Q: Do emoji characters have to look the same wherever they are used?
A: No, they don’t have to look the same. For example, here are just some of the possible images for U+1F36D LOLLIPOP, U+1F36E CUSTARD, U+1F36F HONEY POT, and U+1F370 SHORTCAKE:
In other words, any pictorial representation of a lollipop, custard, shortcake or a honey pot respectively, whether a line drawing, gray scale, or colored image is considered an acceptable rendition for the given emoji.
Q: What about diversity?
A: As with the examples of emoji characters representing food items above, an emoji character like U+1F474 OLDER MAN can vary in appearance depending on the font. Unicode does not require a particular racial or ethnic appearance—or for that matter, a particular hair style: bald or hirsute. However, because there are concerns regarding the emoji characters for people, proposals are being developed by Unicode Consortium members to provide more diversity. These proposals include investigation of use of the Fitzpatrick scale to allow greater diversity in the presentation of emoji characters.
See also What about characters whose name specifies a color?
Q: Unicode 7.0 included many new emoji, but did not address the issue of diversity. Isn't that more important than a sunglasses emoji?
There is a long development cycle for characters, so the sunglasses
character was first proposed years before Unicode 7.0 was released.
Any proposals under consideration will also take time to assess and
develop. See also How
can I get the Unicode Consortium to add a Unicode emoji?
Q: How were emoji encoded on cell phones?
A: Cell phone carriers in Japan have long encoded some emoji in Shift-JIS and ISO-2022 as extensions of the JIS X 0208 character set. A core set of 722 emoji constitutes the union of the emoji sets encoded in this way by the three most popular cell phone carriers in Japan. These core emoji characters are interchanged as plain text by millions of people daily (in SMS text messages and e-mail subject lines, for example), and need to be handled by e-mail systems, search engines, publishing systems, databases, and so on. For emoji beyond this core set (including those that are still being created), vendors have added rich text support, and use approaches such as embedded graphics. Similar techniques (embedded graphics or escape tags designating emoji) are also typically used for emoji support in China and the Republic of Korea. [PE]
Q: How were emoji originally encoded in Unicode?
A: 114 characters in the core emoji set are mapped to sequences of one or more characters available in Unicode before Version 6.0. The other 608 characters in the core emoji set are mapped to sequences of one or more characters added in Unicode 6.0, primarily in the blocks for Miscellaneous Symbols and Pictographs, Emoticons, Transport and Map Symbols, but also in blocks such as Dingbats and Technical Symbols. There is no block set aside specifically for emoji.
Characters that are separate in the extended JIS X 0208 sets used by the three major cell phone carriers in Japan are mapped to separate characters in Unicode in what is known as the Emoji Source Separation Rule. For example, the emoji core set includes a character mapped to U+1F3B5 MUSICAL NOTE; this could not be unified with U+266A EIGHTH NOTE, because both exist as separate characters in the extended JIS sets used by all three of the major cell phone carriers in Japan.
Because characters in the core emoji set are treated as pictographs, they are encoded in Unicode based primarily on their general appearance, not on an intended semantic. In fact, when used as emoji, many of these characters acquire multiple meanings based on their appearance; for example, an emoji character for “bank” which includes the letters “BK” has taken on the secondary meaning “bakkureru” (a slang term for evading one’s responsibilities). The identity of characters in the emoji core set is defined primarily by their mapping to Unicode, as specified in the file EmojiSources.txt. [PE]
Q: How many emoji characters are in Unicode now?
A. This question does not have a simple answer, because there is no clear line separating which pictographs should and should not be displayed with a typical emoji style. But roughly speaking, aside from the core set in Unicode Version 7.0 there are about 550 other characters that could also reasonably be displayed with typical emoji style (colored), such as U+1F46D TWO WOMEN HOLDING HANDS. There are also ways of representing emoji for national flags, adding about 240 others.
Q: How should emoji be displayed?
A: While emoji symbols may be presented using color and animation, they need not be. Because many characters in the core emoji sets are unified with Unicode characters that originally came from other sources, there is no way based on character code alone to tell whether a character should be presented using an “emoji” style; that decision depends on context. [PE]
Q: Is there any way to control the “emoji” style?
A: Certain characters can be followed by a special character called a variation selector to request a particular appearance: U+FE0F for the emoji style (typically colored), and U+FE0E for the text style (black and white). Only certain characters qualify: the exact characters are listed in the file StandardizedVariants.
Q: What about characters whose names include WHITE or BLACK?
A: Names of symbols such as BLACK MEDIUM SQUARE or WHITE MEDIUM SQUARE are not
meant to indicate that the corresponding character must be presented in black
or white, respectively; rather, the use of “black” and “white” in the names is
generally just to contrast filled versus outline shapes, or a darker color
fill versus a lighter color fill. Similarly, in other symbols such as the
hands U+261A BLACK LEFT POINTING INDEX and U+261C WHITE LEFT POINTING INDEX,
the words “white” and “black” also refer to outlined versus filled, and do not
indicate skin color. [PE]
Q: What about other colors in the name?
A: Other colors in names, such as BLUE HEART or ORANGE BOOK,
are the recommended appearance when the characters are rendered in color. (The black and white images in the Unicode
charts use various shading techniques as a stand-in for color.)
Q: What is the difference between emoji and dingbats?
A: Most of the characters in the Dingbats block are derived from a well-established set of glyphs, the ITC Zapf Dingbats series 100, which constitutes the industry standard “Zapf Dingbat” font currently available in most laser printers. Emoji and dingbats have some similarities (and a few core emoji characters are mapped to characters in the Dingbats block). However, while there is often a great deal of flexibility in the range of glyph shapes that may be used for presentation of emoji, most characters in the Dingbats block are expected to be presented with glyph shapes that closely align with those shown in the Unicode Standard. [PE]
Q: How do emoji relate to other Japanese symbol sets?
A: Other symbol sets defined in Japanese standards overlap extensively with the characters in the core emoji set. For example:
Many characters from the Japanese television standard ARIB STD-B24 2007 (from the Association of Radio Industries and Businesses) were added to Unicode in Version 5.2, and are mapped to characters in the core emoji set.
The Japanese recording industry standard RIS-506-1996 specifies an extension of Shift-JIS for use in Music CD text, and includes a number of characters similar to those in the core emoji set. [PE]
Q: What about Wingdings and Webdings? Are they encoded?
A: The symbols in Microsoft’s Webdings and Wingdings series fonts are all in Unicode as of Version 7.0.
Q: Does the Unicode Consortium design the emoji used on my phone and elsewhere?
A: No, the Unicode Consortium does not design emoji. The emoji encoded in the Unicode Standard were added to Unicode because they were in prior use as smart phone characters for text-messaging in a number of Japanese manufacturers' corporate standards, and other places.
Q: I’d like my favorite emoji added to my phone. Can the Unicode Consortium add it?
A: The Unicode Consortium does not make or sell fonts, images, or icons. For concerns about the emoji and flag symbols available in any particular application or mobile platform, please contact the manufacturer. Their software determines what characters are available on your device.
The Unicode Consortium encourages the use of embedded graphics where possible, since they allow much more freedom of expression. For example, see phone icons.
Q: How can I get the Unicode Consortium to add a Unicode emoji?
A: Adding characters to an encoding standard involves a long, formal process. To be considered, characters must be in widespread use, as textual elements. The emoji and various symbols were added to Unicode because of their use as characters for text-messaging in a number of Japanese manufacturers’ corporate standards, and other places. If you wish to submit emoji or any other character for consideration for encoding, see the detailed instructions about how to submit character encoding proposals. It may be helpful to see the Unicode Forum or the Unicode Mail List, as well.
Q: Why is the process so long and complicated?
A: Unicode is the foundation for all modern software: that’s how all mobile phones, desktops, and other computers represent all text of every language. You are using Unicode every time you type a key on your phone or desktop computer, and every time you look at a web page or text in an application.
It is thus very important that the standard be stable, and that every character that goes into it be scrutinized carefully.
Q: Why can’t I find my national flag in my mobile application or on my smart phone?
A: For concerns about the emoji and flag symbols available in any particular application or mobile platform, please contact the manufacturer. Their software determines what characters are available on your device.
Q: But the Unicode Standard includes other flags, why don’t you include my flag?
A: The Unicode Standard does not encode single characters directly representing the symbols, icons, or images for any national flag. It does encode a set of regional indicator symbols. These can be used in pairs to represent any territory that has an ISO 3166-1 two-letter code, such as “DE” for Germany. The pairs are typically displayed as national flags: there are currently 249 such combinations.
However, the Unicode Standard itself does not prescribe which regional indicator pairs are represented with flags in fonts and input palettes on any given device. Please see Section 22.10, Enclosed and Square, in the Unicode Standard.