Re: Emoji: emoticons vs. literacy

From: Jukka K. Korpela (
Date: Mon Dec 29 2008 - 13:04:21 CST

Asmus Freytag wrote:

> The problem stems from the fact that in this kind of scenario 8) is no
> longer unique in the encoding sense.

Pardon? Now I'm _really_ confused. Why should it be "unique"? Is any
sequence of Ascii characters "unique" in the sense that it has only one
possible semantic interpretation? Or even any single character?

> In order to determine whether
> text containing 8) intends to encode the digit eight followed by the
> close paren or in fact intends to encode an emoticon you now need out of
> band information.

Yes, or a good guess. But how is this different from interpreting sequences
of characters in general? The string "88" might be intended to mean an
integer in decimal notation, or in hexadecimal notation, or it could be just
a string of digits used as a code or a label, or it could be someone's

> Requiring out of band information for text content
> is certainly not ideal. Therefore, if there were dedicated character
> codes for emoticons (especially those using short, and therefore commonly
> occurring strings of punctuation marks as fallbacks) the ability to
> used them as a unique way to encode common emoticons would be a
> definite benefit.

You could say the same, with _much_ higher practical motivation, about many
other strings. For example, should we also have characters corresponding to
commonly used strings with special meanings, such as "***", "1234" (commonly
used generically to denote a 4-digit string), "---" or longer (to denote a
horizontal line), or "//" or "°C" or "km"? Oops, some of these already exist
as Unicode characters - do I need to say more?

If you are saying that characters should be interpretable without out of
band information, shouldn't you start worrying about individual characters,
like "." (which could be a decimal point, a full stop, a separator of
fields, or something else) and "I" (which could be a "normal" letter, or the
uppercase equivalent of Turkish dotless i, or the roman numeral one)?


