Plane 14 Tag Deprecation Issue (was Re: VS vs. P14 (was Re: Indic Devanagari Query))

From: Kenneth Whistler (
Date: Thu Feb 06 2003 - 14:54:04 EST

  • Next message: Tex Texin: "Re: list etiquette (was Re: Tailoring of normalization"

    Doug wrote:

    > Asmus Freytag <asmusf at ix dot netcom dot com> wrote:
    > > Unicode 4.0 will be quite specific: P14 tags are "reserved for
    > > use with particular protocols requiring their use" is what the
    > > text will say more or less.
    > I didn't know the question of what to do about Plane 14 language tags
    > had already been resolved.
    > If that is the case, it might make sense to add an explanatory note to
    > the Public Review item on Plane 14 tags, or simply to remove the item.

    The issue up for public review, as it states, is about
    formal *deprecation* of the Plane 14 Language Tags.

    The UTC already has consensus on limiting the use and contexts
    of use of the language tag characters. Such language was written
    into Unicode 3.1:

      "The [language tag] characters... provide a mechanism for
       language tagging in Unicode plain text. <emphasis>However,
       the use of these characters is strongly discouraged.</emphasis>
       The characters in this block are reserved for use with special
       protocols. They are <emphasis>not</emphasis> to be used in
       the absence of such protocols, or with <emphasis>any</emphasis>
       protocols that provide alternate means for language tagging,
       such as HTML or XML. The requirement for language information
       embedded in palin text data is often overstated. ...

      "Because of the extra implementation burden, language tags should
       be avoided in plain text unless language information is required
       and it is known that the receivers of the text will properly
       recognize and maintain the tags...
      "Language tags should also be avoided wherever higher-level
       protocols, such as a rich-text format, HTML or MIME, provide
       language attributes."
    This language is carried forward, as with the rest of the
    Unicode 3.1 and Unicode 3.2 text, into the consolidated text
    of Version 4.0 of the standard.
    The UTC also long ago approved UTR #20, which states that
    language tags...

      "...were solely included for the benefit of those Internet
       protocols, such as ACAP, which require a standard mechanism
       for marking language in UTF-8 strings, and at the same time
       to avoid the use of other tagging schemes that relied on
       specific details of the encoding form used."
    So what we are talking about here is not opening up again
    the wonderful world of what language tag characters are
    good for, and broadening their use.

    The issue on the table is:

      Because the UTC has determined that the use of language
      tag characters is to be strongly discouraged, and is limited
      in any case to very particular protocols, should the
      UTC take one step further and declare them formally
    The result of the latter decision would be to add a statement
    to that effect in the block description in Unicode 4.0 for
    the language tag characters, and to add the code points
    U+E0001, U+E0020..U+E007F to the list of code points which
    get the Deprecated property in PropList.txt.

    That's it. That's what is on the table for comment and
    eventual decision by the UTC.

    My personal opinion? The whole debate about deprecation of
    language tag characters is a frivolous distraction from
    other technical matters of greater import, and things would
    be just fine with the current state of the documentation.
    But, if formal deprecation by the UTC is what it would take
    to get people to stop advocating more use of the language
    tags after the UTC has long determined that their use is
    strongly discouraged, then so be it.


    This archive was generated by hypermail 2.1.5 : Thu Feb 06 2003 - 15:30:20 EST