Re: Emoji: emoticons vs. literacy

From: Doug Ewell (doug@ewellic.org)
Date: Sat Jan 03 2009 - 11:37:02 CST

  • Next message: Doug Ewell: "Re: Emoji: emoticons vs. literacy"

    James Kass <thunder dash bird at earthlink dot net> wrote:

    > Private Use Area just means user-defined area. There's nothing secret
    > or damaging about user-defined characters, whether they be suitable
    > potential candidates for standard plain-text, or whether they are
    > destined to remain banished in the phantom zone for all eternity.
    > There will always be people wishing or needing to exchange
    > user-defined material, and there's nothing wrong with that. They are
    > using the PUA correctly.

    There seems to be a school of thought that private-use characters are
    inherently evil and should never be used, except perhaps within one's
    own personal system. The thinking seems to be that people will want to
    search for these things and interoperability will be broken, and also
    that "private agreement" implies a certain degree of secrecy and
    extremely limited use.

    It seemed, at once, obvious and brilliant to me, around the 1993 time
    frame, that Unicode would provide a private-use area as part of its
    overall strategy to encode the most commonly used characters, but not
    just any old thing imaginable, so that users who wanted to use the
    Unicode architecture to represent any old thing imaginable could encode
    that thing as a private-use character. I was not familiar with the East
    Asian encodings at the time and did not know that they also supported
    this useful mechanism.

    Over time, the principle of "most commonly used characters" in Unicode
    expanded to include ancient scripts, musical symbols, and mathematical
    font variants, as well as just about every Han character that someone
    could dredge up instead of just the ones in existing standards. But the
    PUA principle remained: you could still encode the Apple logo or Klingon
    or Ewellic in the PUA, and reap the benefits of the Unicode architecture
    without contaminating the Standard's repertoire.

    At some point, perhaps with the rise of the Internet and powerful search
    engines, the idea began to spread that using PUA characters was always
    bad, because of the potential for conflict between different private
    agreements -- as if that possibility had not occurred to anyone before.
    I search for a document containing U+E690 and MegaFinder locates one for
    me, but my interpretation of U+E690 might differ from the one used by
    the author of the document. The private agreement is not transmitted
    along with the document. Supposedly this will cause great
    interoperability problems if I am not intelligent enough to understand
    that this is the nature of private-use codes.

    This school of thought has also carried over to the BCP 47 language
    tagging arena, where people can create tags like "x-piglatin", whose
    meaning should be obvious even without a written and signed "agreement,"
    and can also create "qaa" or "x-abc123", whose meaning would be far from
    obvious, and whose creator would have to be very naïve not to understand
    this. Despite a serious lack of evidence that private-use tags are
    causing a mainstream interoperability crisis, successive versions of BCP
    47 have added more and more warnings against using them.

    If you create an encoding standard of any sort, and include a
    private-use mechanism as a defense against having to encode every
    conceivable blob, and then turn around and discourage use of the
    private-use mechanism, the natural conclusion is that you will feel
    compelled to encode every conceivable blob.

    --
    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
    http://www.ewellic.org
    http://www1.ietf.org/html.charters/ltru-charter.html
    http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ
    


    This archive was generated by hypermail 2.1.5 : Sat Jan 03 2009 - 11:39:49 CST