Re: Emoji: emoticons vs. literacy

From: Michael D'Errico (mike-list@pobox.com)
Date: Tue Jan 13 2009 - 17:20:29 CST

  • Next message: koxinga: "Re: Unihan : Traditional characters having two simplified equivalents"

    In a discussion regarding the possibility of assigning 26 code points to
    be used in pairs to encode country flags, such as <FLAG C, FLAG A> to
    specify the Canadian (CA) flag, Michael Everson wrote:

    > Not even MILDLY tempting as an encoding model.

    It took a while to figure out how we could be in such disagreement, but
    I think I finally did. While I think of Unicode more in terms of an
    information communication protocol, Michael probably thinks of it more
    in terms of information display/rendering. I base this assumption on
    the fact that he is heavily involved in font development.

    The thing I like is that it only requires 26 code point assignments, yet
    has the ability to represent the equivalent XML: <flag>CA</flag> in
    plain text. The code points themselves carry with them the "flag-ness",
    so this information is available even to a plain-text process. If two
    code points were not enough to specify every country or area, as was
    suggested for CYM, then three or more code points could be used to
    accommodate them (with no additional assignments).

    The alternative that Michael prefers is where all pairs of letters are
    encoded: FLAG AA, FLAG AB, ... FLAG ZZ for a total of 676 assigned code
    points. The advantage of this is ease of rendering since you can simply
    look up the glyph given the code point. There is a cost associated with
    it, though, in that you waste an extra 650 code points to provide the
    same amount of information. Given Whistler's Conjecture, this can be
    rationalized away, though it should be a decision made knowing that it
    is a rendering optimization. The 676 code points are equivalent to the
    following XML: <flag-aa />, <flag-ab />, etc. so flag-ness is also
    conveyed in plain text. There is a problem in that it is limited to just
    2-letter codes, so if more were needed, a different solution would be
    required. I doubt anyone would suggest assigning all 3-letter combina-
    tions (26^3 = 17,576).

    For completeness, I should also mention Philippe's suggestion to use
    HTML such as: <img src="flag-CA.svg" /> There are numerous problems
    with this approach: first, you need an HTML parser to even determine
    that something special needs to be done to display the flag; second,
    all the HTML parser can determine is that an image is embedded (could
    be of anything); third, the only possible way of determining that a
    flag is in the image is to parse the URL/filename and hope that it
    follows some convention to tell you which country's flag it is. So
    clearly this is not a reliable way to represent a country flag in
    HTML, much less plain text.

    In summary, I am OK with the rendering optimization Michael advocates,
    though it is a special case since it is limited to 2-letter country
    codes. In the future, if a similar encoding challenge arises that needs
    more letters in combination, that approach would not be acceptable.

    Mike



    This archive was generated by hypermail 2.1.5 : Tue Jan 13 2009 - 17:24:12 CST