But E0000 Custom Language Tags Are Actually *Required* For Use By Unicode

From: UList@dfa-mail.com
Date: Wed Mar 02 2005 - 11:08:51 CST

  • Next message: Addison Phillips: "RE: Script Continuums (Was: Re: Greek glyphs)"


    I'm getting the sense that people think I'm trying to do something special and
    unusual, possibly controversial -- and certainly optionally ignorable by them
    -- with the local versions of the Greek script.

    I think the following discussion may clarify the broader issue.

    1. The only correct way for the Serbian italic 't' is to be dealt with is by
    a "language tag".

    2. The Serbian italic 't' definitely does not have separate characterhood.

    3. Since the Serbian italic 't' does not have a separate characterhood, which
    is just too obscure for Unicode to ever assign a codepoint to, it does *not*
    belong in the PUA (other than as an emergency measure).

    4. The sole correct way to access this local variation of the Cyrillic script
    is with a "language tag".

    5. The local versions of the Greek script are *identical* in nature to this
    local version of the Cyrillic script. They are particular, local styles of the
    same international script system, as used by different Ancient Greek states.
    They are in fact referred to by the technical term "epichoric", meaning
    literally, "local".

    6. Therefore, the local versions of the Greek script do not have separate
    characterhood. They should never be assigned codepoints.

    7. The only correct way to access local variations of the Greek script is
    with "language tags".

    8. They are not characterhoods that are too obscure to ever be assigned
    Unicode codepoints, therefore they do not belong in the PUA (other than as an
    emergency measure).

    9. The only correct way to access local variations of scripts, including
    variations of Cyrillic script, of Greek script, of Berber script, etc., is
    with "language tags".

    10. *But* I have previously demonstrated, fairly obviously, that it is hardly
    practical for Microsoft to add long lists of OpenType "language tags" for
    something as obscure as extinct local variations of Greek script. It is
    certainly not practical for Microsoft to add lists of every of possible local
    variation of every obscure script such as Berber.

    11. *Therefore*, some kind of "custom language tag" system is a
    *requirement*, for Unicode to function as it is claimed it is *intended* to function.

    12. This is not an obscure, personal desire of mine. It is an essential and
    inherent component of the approach Unicode itself has created (but perhaps
    failed to think through to its conclusion).

    13. Unicode has in fact created exactly this custom language tag system with
    the E0000 block. [LANGUAGE][x}[-][custom_language_name][END LANGUAGE]. But
    then this system has been "strongly disrecommended" and therefore is not
    likely to be implemented by font technologies.

    14. THEREFORE, in order to make it actually possible to use Unicode's *own*
    stated and vigorously defended philosophy on the sole correct means of
    accessing local script variants -- for local script variants which are too
    obscure to receive official language tags -- Unicode must do one of the following:

        A. Recommend use of, and implementation by font technology of E0000
    custom language tags (or better, add an E0000 custom script tag).

        B. Make sure that some other higher-level "custom language tag" system is
    going to actually exist, usable in all font technologies, before shifting
    responsibility to it.

        C. Make sure that a means of accessing generic "alternate selection"
    features in all font technologies is actually going to exist, before shifting
    responsibility to it.

    15. OR, if Unicode will not or cannot do any of those things, Unicode must:

        A. Assign official Variation Selectors for all the local forms of all
    scripts (a monumental and controversial task), or,

        B. Create new user-definable Variation Selector-like codepoints, for use
    in selecting the obscure local variations of (obscure) scripts.

    In addition, and separately:

     - you may have noticed from this discussion that it seems "(local) script
    tags" are more appropriate than "language tags" for all these matters,
    including Serbian 't';

     - you may have noticed the need for Unicode to assure the existence for major
    local script variations as well, of a standardized system of language (and/or
    script) tags, in all font technologies, before shifting essential
    responsibilities to such a system.

    Thank you for your consideration,

    This archive was generated by hypermail 2.1.5 : Wed Mar 02 2005 - 10:56:09 CST