Re: Latin letter GHA or Latin letter IO ?

From: Kenneth Whistler (
Date: Mon Jan 05 2004 - 16:42:27 EST

  • Next message: "U+0185 in Zhuang and Azeri (was Re: unicode Digest V4 #3)"

    Peter Kirk wrote in response to Philippe Verdy:

    > But you do seem to have found a real problem with the standard. If the
    > character name is not guaranteed to be an accurate means of
    > identification of the character, and the glyph is not normative, how can
    > I know from the standard that U+01A3 is intended to be this pan-Turkic
    > gha, i.e. that that is its fundamental character identity, and that it
    > is not in fact a character in some other even more obscure variant Latin
    > alphabet which is actually named or pronounced "oi"? Of course the notes
    > do help, as does the glyph, but these are not normative.

    You know by making use of the standard, where the informative
    notes (= gha, * Pan-Turkic Latin alphabets) were added precisely
    to enable the proper identification.

    You also know, in the case of confusing edge cases, by coming
    to this discussion list and browsing through the archives for
    the "story" of U+01A2/U+10A3, which is abundantly documented
    there, or by asking the various experts who are familiar with
    the intent of the standard.

    This is all *much*, *much* easier than some of the problems
    posed by Han characters, where the identity of many of the
    obscure, historic characters is a matter of extensive research
    into the cross-references in Unihan.txt to pin them down to
    sources and differentiate them from the numerous kinds of
    variants that occur in the vast sea of Han characters.

    For the nit-pickers, here is my assessment of the status of
    name and glyph in the standard.

    The character name is normative and immutable. That doesn't
    mean that it is always "correct", as we have discovered in
    cases such as U+01A3. The character name is also a mandatory
    part of the documentation of the standard -- it is present
    for every character, either explicitly listed or a rule
    given whereby it can be derived (for Hangul and Han).

    The character glyph is informative and mutable. What this means
    is that the character committees are not attempting to
    *standardize* the glyph shapes per se. Unicode is not a
    font standard. And different fonts have been used to print
    the standard(s) over the years, so there have been minor
    emendations to the particulars of glyphs over the years.
    However, the glyphs are also a *mandatory* part of the documentation
    of the standard. They are present for every character, precisely
    to assist, via a representative glyph shape, in the proper
    identification of the character encoded at each code point
    in the standard.

    When the combination of character name and representative
    glyph and associated informative annotations is insufficient
    to correctly identify a character in the standard, the
    recourse is to Ask the Experts and request further annotation
    of the standard to assist future users from running into the
    same problem.


    This archive was generated by hypermail 2.1.5 : Mon Jan 05 2004 - 17:39:04 EST