What is a character? (was RE: Emoji: emoticons vs. literacy)

From: Peter Constable (petercon@microsoft.com)
Date: Sat Jan 10 2009 - 12:02:43 CST

  • Next message: Peter Constable: "RE: Emoji: emoticons vs. literacy"

    [a long contribution, for whatever it's worth]

    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of Jukka K. Korpela

    > As such, the interoperability argument is a strong one.
    > Yet, is this about _text_ interchange, and specially
    > plain text? Emoji symbols look like images, even though
    > their origin is in Ascii strings like ":-)". Do they
    > constitute an emerging writing system? Maybe. But it
    > looks more like an attempt to create a set of images,
    > to be referred to by their identifiers.

    First, a clarification: as has been pointed out earlier, the origin of these emoji is not in ASCII strings like ":-)"; that's a *separate convention* used to represent similar graphic expressions in a technologically-limited context. Clearly, the origin of both the encoded emoji symbol and the ASCII emoticon is the graphic smiley face.

    Now turning to the rest...

    Yes, this is about _text_ interchange, at least in the eyes of those who are willing to consider this proposal. For those that are strongly opposed, the basis is that these *should not* be considered text.

    Note: I use the subjunctive "should not be considered" rather than "cannot be considered" or "are not" because the de facto reality that nobody can debate is that there is a large user community out there that *is* treating them as text.

    And it is that reality that motivates those of us that are open to considering this set. Of course, in being open to that possibility, we bring up the question, "What is a character?" And it's how people think about that question that is blocking those strongly opposed.

    So, what about that question? I think everybody would agree that the best examples of characters are things like "a": the graphic elements that people use to write language, that get cut into type and printed in newspapers, books etc. Now, we're all also willing to consider things like math operators and dingbats (or, at least, nobody seems to be as concerned about the dingbats as they are about the emoji), so clearly our notion of character is not strictly limited to the criteria I gave above. The boundary is certainly broader, and I'd contend that the boundary is fuzzy: that the concept of character isn't defined as a set of inclusion criteria applied in a binary manner, but rather should be defined in terms of a set of prototypical criteria that may or may not all hold in any given instance of a character.

    Of course, there are things like BEL, DEL, DC1 and ESC that indisputably are characters, yet have nothing at all in common with "a" -- nothing, that is, except that they get exchanged between devices and processes in the same protocols with the same kind of representation.

    Now, controls present a choice to us: either we have one prototype for character with very little in the way of criteria that must be true of all characters, or we have two prototypes: one for graphic character, and another for control character.

    If we were to take the former approach, then that leads directly to the position of those open to encoding the emoji: the only mandatory criteria for character would seem to be that they be units of information that we choose to represent as "encoded characters" and exchange in "text" protocols. But that just turns the question about the definition into a different, operational question: On what basis do we make those choices?

    But let's consider the latter approach: we have a prototype for control character and a prototype for graphic character. In the case of emoji we're clearly interested in the prototype definition of graphic character. If the boundaries are fuzzy, then we need to choose which things are in and which are out. Yet, we come to the same question as above: On what basis do we make those choices?

    Either way, we come to a not-too-surprising point: the set of graphic characters includes whatever graphic elements we collectively choose to include. Of course, we just aren't all in agreement on how to choose.

    We don't disagree on "a" or on “∈” (U+2208 element of); we all accept (or, at least, tolerate) “✂” (U+2702 black scissors) and “𝅘𝅥𝅮” (U+1D160 eighth note). But when it comes to RAINBOW (e-00D, attested cases of which are polychromatic) or DANCER (e-1BD, attested cases of which are animated), there is disagreement.

    That disagreement isn’t too surprising: those graphic objects diverge pretty strongly from the prototype, things like “a”, because of the polychromatic and animated qualities. An important point is that nobody here, I think, disagrees that these are a long way away from the prototype and, at the least, strain the limits. The difference is that one group sees these as going well beyond the limit while the other is willing to see that as just straining the limits and to tolerate that strain (and intending to mitigate to some extent by abstracting some of the concepts to be encoded). And the latter group feel that must be done because of the de facto reality mentioned above: these things *are* being treated as text in public interchange.

    There is another aspect of the disagreement that’s interesting. Consider e-014, CRESCENT MOON: if someone came with a proposal to encode just that character, giving examples of attested usage in-line in text, then I suspect nobody would oppose it. Yet, here it is as part of a large set of graphic elements, and it seems that some will now oppose it for that reason. Note that this is a separate issue from the distance-from-the-prototype issue, though I think it is conditioned by that other issue. (If someone came with a large set of static, mono-chromatic graphic elements attested in usage in text, I doubt there’d be the same opposition from those opposing the emoji.) This also is probably not surprising, as it leads us to be weary of the potential for more stuff that strains / exceeds the limits in the future. It seems that some don’t want to take that risk, while others are willing to take the risk and encode things like CRESCENT MOON that don’t stray too far from the prototype (even if not some of t
    he other emoji). Certainly those open to consider the proposal are open to that risk, and I think that’s because (a) they think the aforementioned de facto reality of existing public text interchange means that these *should* be encoded, and (b) they think there’s reasonable hope to contain this.

    > Color and animation are essential issues, and I find it
    > odd that it has not been commented by those that favor
    > the introduction of emoji as characters.

    I'd disagree that it has not been commented on by that group. Certainly I've commented on it above: it is an issue, and those open to or favouring encoding the emoji will certainly have to come to grips with what to do about it; and I anticipate there will be a fair amount of discussion about that when the completed proposal is discussed by the Symbols sub-committee and by UTC next month, and before that as the proposal is getting drafted.

    > For emoji, current and future, being inherently
    > graphic and iconic, it would be odd to exclude the
    > possibility of making distinction between symbols solely
    > on the basis of their colors or motion.

    I suspect that's a point that the Symbols subcommittee and UTC likely will end up discussing.


    This archive was generated by hypermail 2.1.5 : Sat Jan 10 2009 - 12:05:28 CST