Taiwanese proposal

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Oct 23 2002 - 12:02:55 EDT

  • Next message: David Starner: "Character identities"

    The WG2 home page was updated today to add a link to document N2507,
    "Draft of Proposal to add Latin characters required by Latinized
    Taiwanese Holo language to ISO/IEC 10646" [1], by a group called the
    Department of Language Education of National Taitung Teachers College.
    The document is dated either 2002-03-11 or 2002-03-31, depending on what
    part of the title page you look at.

    This document proposes a COMBINING RIGHT DOT ABOVE for use in a popular
    Latin-script orthography of the Taiwanese Holo language. Some time ago
    (I can't look up exactly when because the unicode.org archives are
    unavailable), I wrote that this combining character should be added in
    lieu of a largish collection of precomposed characters. Ken Whistler
    responded that the issue had already been debated, and a solution
    already presented to use U+0307 COMBINING DOT ABOVE (possibly
    incorporating a Taiwanese font-specific glyph variation to move the dot
    to the right).

    Evidently the Taiwanese teachers did not consider this satisfactory, as
    they have responded with this new proposal to encode a separate

    Whether this new combining character makes sense, however, the rest of
    the proposal clearly does not. The group has proposed no less than 42
    precomposed Latin characters, all of which can be formed using existing
    Latin letters and combining marks (together with the proposed RIGHT DOT

    The 42 precomposed letters are proposed "to be added to Latin
    Extended-B," which is a puzzle to me since that block has only 25
    available code positions as of Unicode 4.0.

    Much more troubling, however, is the fact that this group has apparently
    ignored or disregarded the Unicode/10646 policy against standardizing
    new precomposed letters that can be composed with existing characters.
    The document says:

    "The precomposed characters are proposed to ensure compatibility with
    the existing font "HoloWin" in the word-processing software HOTSYS
    widely employed in the user community. We have been promised composing
    characters in major (Microsoft etc.) implementations since 1997. Now, 5
    years later, we still have nothing."

    Compatibility with 8-bit legacy fonts and software is *not* sufficient
    cause for encoding new precomposed characters. The WG2 "Principles and
    Procedures" document [2] specifically states that a precomposed
    character should not be encoded "if solely intended to overcome
    short-term deficiency of rendering technology." The Taiwanese document
    does not say which "major (Microsoft etc.) implementation" fails to
    support composition using combining marks, but as a previous thread on
    this list has shown, there is at least some support in Internet Explorer
    for such characters.

    Try this experiment: One of the precomposed characters proposed by the
    Taiwanese teachers is LATIN SMALL LETTER N WITH CIRCUMFLEX. Here it is,
    encoded properly as U+006E U+0302:


    Some of you will be able to see this character, others will not.
    Rendering technology is not perfect yet. But this is the correct way to
    create new accented letters in Unicode/10646, not by adding more
    precomposed characters.

    The proposal for a new COMBINING RIGHT DOT ABOVE may or may not have
    merit -- I'm not going to commit firmly to the idea that it does, like I
    did last time -- but the 42 precomposed letters have no business being
    encoded and should not be debated further.

    -Doug Ewell
     Fullerton, California

    -Doug Ewell
     Fullerton, California

    [1] http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2507.pdf
    [2] http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2352r.pdf

    This archive was generated by hypermail 2.1.5 : Wed Oct 23 2002 - 12:55:48 EDT