Re: Unicode Stability (Was: Re: E0000 Language Tags for Some Obscure Languages)

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Mar 02 2005 - 10:42:13 CST

  • Next message: Dean Snyder: "Re: Script Continuums (Was: Re: Greek glyphs)"

    Elliotte Harold <elharo at metalab dot unc dot edu> wrote:

    >> ... Really, no opinions have ever changed that much regarding the
    >> Plane 14 tags. They were born as the red-headed stepchildren of
    >> Unicode; they were created only to prevent a particular protocol from
    >> using a mutant form of UTF-8 for language tagging.
    >
    > Which protocol was that?

    That would be ACAP. They needed a technique for plain-text language
    tagging, which ruled out a separate markup layer of the form <span
    lang="xx">...</span>.

    There was an Internet-Draft for something called "Multi-Lingual String
    Format" (MLSF), written by Chris Newman, that described the proposed
    mechanism in exemplary detail. Basically, it used illegal UTF-8
    sequences to represent language tags. It was easy to encode and decode,
    as designed. It was also incompatible with any form of Unicode other
    than UTF-8, and would have been rejected by existing UTF-8 processors
    that did not expect this "higher layer." It very possibly would have
    destroyed the stability, and consequently the widespread acceptance, of
    UTF-8.

    You can read the draft here:

    http://xml.coverpages.org/draft-ietf-acap-mlsf-01.txt

    In my 2002 paper arguing for the non-deprecation and greater acceptance
    of Plane 14 tags, I attributed MSLF to "a group of CJK users" who wanted
    language tags to perform Han glyph selection. That assumption was
    clearly bogus.

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Wed Mar 02 2005 - 10:44:15 CST