Re: Emoji: emoticons vs. literacy

From: André Szabolcs Szelp (
Date: Mon Jan 05 2009 - 15:13:29 CST

  • Next message: André Szabolcs Szelp: "Fwd: Emoji: emoticons vs. literacy"

    2009/1/3 Asmus Freytag <>:
    > On 1/3/2009 2:06 AM, Ruszlan Gaszanov wrote:
    >> Peter Constable wrote:
    >>> I don't mean just communicated between different vendors' processes, but
    >>> also interpreted and processed by different vendors' processes, in contexts
    >>> >where no private agreement can be assumed. If text content is getting
    >>> generated in (say) DoCoMo text protocols, spreading into other content via
    >>> other protocols and then that content is getting interpreted by processes
    >>> >produced by Google or Apple or whomever, than the sense in UTC (I think I
    >>> can say) is going to be that that is *public* interchange, hence presenting
    >>> a case for being representable in the UCS.
    >> The fact that now not 3 but 5 vendors are using those PUA conventions does
    >> not necessarily make it "public interchange". As I see it, the usage is
    >> still restricted to the limited number of specific vendors.
    > What's the magic number at which things become "public" in your take? 6
    > vendors? 60 vendors? 600 vendors? 6000 vendors?
    > In that context, it's worth remembering that the two emoticons (sic) that
    > have been encoded in Unicode forever at WHITE/BLACK SMILING FACE exist
    > because of a single vendor's character set: IBM's code page 437 (and its
    > descendants).
    > Telephone text messages are not a closed system, because telecoms typically
    > provide means to connect to incoming and outgoing email at the minimum. You
    > can expect these codes to leak onto the web in due course, if this is not
    > happening already. Whatever the mechanism for that leakage, what Peter is
    > rightly objecting to is a world where text in open interchange needlessly
    > contains units that are un-interpretable.
    > It doesn't matter whether one or two vendors are causing this - as long as
    > their system isn't *closed*, it's not true private interchange.

    As if _any_ private, PUA usage would be closed in the internet age!
    Scholars defining obscure mediaeval characters in PUA for their use of
    manuscript transcriptions interchange and publishing it on the web is
    not a "closed system" either. Klingon, or for the matter of fact, any
    invented script, even those which might be used by a handful of fans
    (e.g. tech-savvy teenage boys inventing their sci-fi or fantasy cypher
    script, to parallel James' japanese schoolgils example ;-) ) operating
    an online forum (bulletin board) will be "leaked", potentially
    indexed, &c., &c.

    "Private" does not mean "secret".
    "Private" in the context of PUA means, something, that is either lacks
    the resources OR the validity (based on existing principles) to be
    encoded as proper Unicode entities, but which a well defined set of
    participants (which chan be 6000 scholars or 6 mobile operators) still
    want to transmit in a plain-text-like protocol.

    It's plainly obvious, I had thought before this ‒ pretty weird ‒
    discussion kicked off on this list.

    I was an avid ‒ and blind ‒ follower of the Unicode Spirit. Well, I'm
    still convinced of Unicode 5.1, but I am disappointed of the
    "flexibility" (i.e. nonexistence) of the principles of those who
    govern it. For me "standards" were always about *stability*. Stability
    is about being lost.

    Honestly, where will you draw the limits? Will we be able to encode
    every single image in unicode I can come up with if I define in a
    document it's identity with a PUA codepoint and publish a website with
    that PUA, (putting a description next to it in latin describing the
    context of the image -- transmitted as PUA-codepoint); it is leaked
    into the net, after all, and might be indexed by Google and the exact
    semantics might not be determinable of its link to the image "Mr. XY
    posing in front of the Eiffel Tower" in a hundred years? Anything
    private character invention that is published in the internet will be
    encodeable and encoded?

    And now to an (IMHO) very important point:

    Actually, even in the domain of emoji, how do you define character
    identity? How do you know that a "Chick" is a different character
    entity of "Hatching Chick", how do you know they are not mere *glyph
    variants* of the character FLEDGELING?? Having had assigned different
    private JIS-codes in the operators' private standard does not make
    them different characters, as we've seen it with preexisting standards
    of Arabic (having a codepoints for every positional variant) or the
    previously cited Chinese national standard using PUA for precomposed
    Tibetan glyphs! The same for "Red Heart", "Purple Heart", &c, &c. How
    do you know they are not mere presentational/glyph variants of the
    character HEART (already encoded: U+2661, U+2665, U+2764) assigend
    different codepoints in a standard not aware of the character-glyph
    model? (There have been a plenty of standards which would not make
    this distinction and would encode glyphs rather than characters; how
    do you know the present emoji private encoding is not such one?)


    This archive was generated by hypermail 2.1.5 : Mon Jan 05 2009 - 15:17:45 CST