From: Edward H. Trager (
Date: Sat Oct 22 2005 - 13:53:46 CST

  • Next message: Richard Wordingham: "Re: LAO LETTER FO SUNG and LAO LETTER FO TAM"

    On Friday 2005.10.21 20:41:15 -0700, Asmus Freytag wrote:
    > On 10/20/2005 8:58 AM, Edward H. Trager wrote:
    > >Would anyone even buy Webster's if the publisher did that?
    > >Would any scientist even bother to read Nature if the journal did that?
    > >Am I opening up a can of worms?
    > >
    > Your comparisons are not as apt as you imagine.
    > There is always a tension between formal identifiers and human use of
    > them as mnemonic labels. Would the C-runtime be more or less successful,
    > if each vendor had vetted the names of the functions and renamed those
    > that were misleading?
    > How about the order of parameters supplied to these functions - they are
    > downright inconsistent, and do lead to programming errors -- yet no-one
    > wants to re-architect that API.
    > All these are environments where absolute stability allows existing
    > software to continue to operate, even if the burden of navigating the
    > inconsistencies is placed on the human user.
    > The Unicode character names were designed for use as (shared)
    > identifiers across a series of ISO standards - the same name is used in
    > several standards for the same character, even if the character is at
    > different positions in each. The use of these identifiers outside this
    > ISO environment may not have materialized to the degree its designers
    > (predating Unicode) had hoped.
    > They certainly could not have envisioned the degree to which users would
    > try to rely on these labels as a source of information *about* the
    > character in question. However, that doesn't make their design invalid,
    > nor does it make the later decision (supported by Unicode) to take the
    > logical conclusion from the original design parameters and make the
    > character names formally immutable. Not doing so would have left only
    > the code position as a unique identifier, and that lacks the necessary
    > redundancy for such a large set f character.
    > One side effect of this stability is that it is now possible to document
    > issues about characters in a human readable way that is stable. You can
    > now write: "LATIN CAPITAL LETTER AE is really a ligature" and not have
    > to update this sometime in the future when the committee might decide
    > suddenly that the name should reflect this rather than that aspect of a
    > character.
    > The other side effect is that it has enabled the UTC to get on with
    > business, since the number of requests to change character names based
    > on preference would have been a serious drain on resources -- instead,
    > they either result in editorial addition of a comment, or a footnote in
    > the meeting minutes.
    > >The decision by the Unicode Consortium to *not* provide normative names for
    > >the Unified Han characters represents a precedent of either
    > >*inconsistency* or *amendation* in the application of the rule requiring
    > >assignment of normative names in Unicode.
    > >
    > The normative name for character 4E00 is CJK UNIFIED IDEOGRAPH-4E00 and
    > so on. Your statement is contra-factual. Given that the names are

    My point, which I guess I did not get across, was that the normative names
    for the unified CJK characters are completely uninformative : the information
    content that I can extract from "U+4E00" as the code point value and from
    "CJK UNIFIED IDEOGRAPH-4E00" is the basically the same -- the only difference
    being that the normative name tells me it is a CJK character. But which one?
    I recognize *why* it was done this way and I was not, nor am I now, going to
    argue about the normative CJK naming. I was merely trying to point out that
    there is precedent for having different rules for different blocks in Unicode,
    with CJK unified ideographs and Hangul syllables being examples.

    If the Unicode Consortium had accidently mixed up the names of letters
    in the Latin alphabet, say like this:


    ... and then *insisted* that these names were immutable, would the consortium
    not be the laughing stock of the ISO, W3C, IEC, IETF, and numerous other standards organizations?
    Would the ISO and IEC really have allowed such a thing to stand when it came
    time to ratify the standard?

    But for Lao ... ooops, it will have to stand uncorrected. Obviously the
    government of the Lao People's Democratic Republic is not a member of the
    Unicode Consortium!

    - Ed Trager

    This archive was generated by hypermail 2.1.5 : Sat Oct 22 2005 - 13:43:45 CST