From: Asmus Freytag (firstname.lastname@example.org)
Date: Tue Dec 30 2008 - 17:39:24 CST
On 12/30/2008 1:15 PM, Jukka K. Korpela wrote:
> Asmus Freytag wrote:
>> Originally, emoticons started as a punny way of using punctuation.
> I don’t think that’s an accurate description. Emoticons are, by their
> nature, special symbols rather than punctuation, even when composed of
> punctuation marks.
Emoticons (*not* emoji) started as :) for smiley, etc. These were just
strings of punctuation characters selected for their visual similarity
to a line drawing (using 90 degree rotation as well). That's what I mean
with "punny way of using punctuation. More visual, but in nature not so
far removed from "lol" and "AFAIK" and similar devices. At that level,
they are clearly plain text (they were used in eminently plain text
interchange) and correctly encoded as ASCII -- they were simply a form
of ASCII usage.
Only later were actual 2-D images, not requiring rotations, associated
with these strings, and supported by application software.
> You might compare a smiley to a question mark especially in languages
> where question marks are the _only_ way of distinguishing a question
> from a statement. Yet, far more often, emoticons are just something
> supposedly funny, comparable to drawings.
I think viewing emoticons globally as "just drawings" is not very
helpful. Certainly the most common of them are used more like an
extended set of punctuation symbols. (This discussion has compared them
to cantillation marks and similar devices - all much better
classifications than mere "drawings", but no need to repeat the earlier
>> However, nowadays most users of these things pick them
>> from a list of symbols
> Is that the real reason for the discussion, or is the real reason what
> John Hudson wrote: that some companies transmit emoticons as
> characters in a nonstandard encoding?
What John refers to are the "emoji". While the emoji contain some of the
same symbols that are used for "emoticons", these two sets are not the
same thing. Emoji (the Japanese set) are encoded using single character
codes (SJIS extension). Emoticons, currently, are encoded using strings
of mostly punctuation marks (ASCII). I'm discussing emoticons here.
>> What used to be a punny way of
>> using punctuation has become de-facto markup for text elements. Just
>> like the TeX markup for mathematical symbols, or > in HMTL.
> No, it’s not markup. Whatever it is, it is not special notations that
> enclose text characters. Entity references like > might be called
> markup, but they are really auxiliary notations—and ”>” is actually
> never needed, it’s used just for symmetry with ”<”, which is needed
> because ”<” as such is really markup-significant, tag start character.
That's digressing and therefore irrelevant. The current practice is
using ":)" or "8)" or ":evil:" in contexts where the sender inserts them
by selecting a picture from a list and the receiver sees that picture
inserted into the text stream. That makes ":)" etc, function like *markup*.
>> However, the use case here is that the
>> display is fixed, and it's up to the user to make the distinction.
> Once again, I don’t follow. What’s ”fixed”? There’s nothing fixed in
> ”8)” as far as I can tell. It’s a two-character string, which has many
> interpretations and many renderings.
The situation I described is where the application supports the display
of the emoticon symbol, not the ASCII string. Few users, seeing the
symbol displayed in an inappropriate context will be able to "guess"
what ASCII characters were really meant - to them, the message will be
compromised. (Even if you have a handy switch to turn off emoticon
display, I bet few users will know what to do with it - I also bet many
of them will not know why there's suddenly a 8) in their text).
That's because the current practice is no longer predominantly that of
users typing punny punctuation strings, but that of selecting symbols
from lists, and seeing these symbols displayed as if they were entities
(except that they are more colorful in their rendering). These users no
longer intend to write ASCII "8)" they intend to write a symbol. To them
"8)" is just as much markup-gobbledigook as > or . That's the
>> No, I'm not arguing for unlimited semantic encoding. Unicode's design
>> point is that the display on the receiving end can unambiguously
>> confer the intent of the author in terms of the identity and ordering
>> of the written symbols.
> Where does the Unicode Standard state this?
> According to Wiio’s law, all communication fails, except by accident.
> There is absolutely no way to guarantee that a string of characters
> gets interpreted ”as intended”, or really any way to absolutely know
> what was intended. And even more certainly, it cannot be guaranteed at
> the level of coding characters.
You are stumbling over "interpretation" here. That word is used in a
funny way in the Unicode standard. It does not refer to interpreting the
whole of the text, but interpreting something as a character. And
Unicode is indeed about making sure that sender and receiver interpret
codes as characters in the same way. Wiio's law is beside the point here.
> If you mean that it must be possible to indicate the meaning of
> something as an emoticon symbol, then I think we are back to the
> question whether such symbols, as independent characters and not a
> play on characters, are used.
> Shouldn’t this be quite independent of the question whether they have
> ASCII ”fallbacks” or imitations or origins? Yet we are stuck with the
> confused issue of ”emoticons” that are ASCII strings on one side and
> ”independent” character on the otther.
No, in the Unicode context, if you interpret ":)" to mean the character
"smiley", then you are no longer interpreting it as two ASCII
characters. For HTML to interpret > as ">" is fine, because it's a
clearly defined protocol, with announcement mechanisms. For general
text, the use of ":)", or worse "8)" presents an ambiguity, precisely
because of the absence of clear protocol definition or announcement
If you used U+263A instead of ":)" then your text would no longer be
ambiguous on the character level.
Encoding additional emoticons would have the potential benefit of
allowing users (and applications) to sidestep the ambiguous ASCII
markup. The potential benefit would be most felt where emoticons are
part of the most commonly used subset, and where their ASCII markup is
the most prone to misinterpretation or confusion with regular text (e.g.
"8)" or "B)" or similar).
This archive was generated by hypermail 2.1.5 : Fri Jan 02 2009 - 15:33:07 CST