From: Philippe Verdy (
Date: Sat Oct 22 2005 - 08:19:26 CST

  • Next message: Eduardo Mendez: "Re: CLDR license"

    From: "Asmus Freytag" <>
    >> Without such notice published with the standard itself, this standard
    >> remains confusive (and given that the representative glyph is not
    >> completely normative and can be changed at any time for another or less
    >> confusive representation of some encoded character, users may favor the
    >> interpretation given by the normative character name, hence generating
    >> encoding errors...).
    > Glyphs are indeed not normative, because no single glyph ever captures the
    > entire range of visual appearance for the character. Nevertheless there
    > are limits on what changes can be made to a glyph, the most important of
    > them being that the change in glyph must respect the underlying character
    > identity.

    I know all that (this last sentence is explained within the standard

    But this is not a definitive argument. Unicode has already changed the
    appearance of such glyphs so that it effectively changed the underlying
    character identity. The glyph is then just a hint, but does not define the
    character identity itself. Once you ignore it, the remaining character
    identity is its name and its normative properties.

    But the character properties between two letters of the same script are
    almost identical (this is the case of the lao letters discussed here). So it
    only remains the normative character name to identify the character. But
    Unicode says that this is just an identifier, without much semantic meaning
    because it is immutable and just an identifier equivalent in meaning as its
    associated numeric code point.

    Conclusion: the character identity is very weak. There must exist something
    else to confirm this identity. If the name is wrong, then there must exist a
    strong notice, part of the standard that explicitly says that, and explains
    the expected semantic.

    In fact, the semantic of the character is only confirmed by its effective
    most common usage. This is a pragmatic point of view, based on consensus on
    how it should be interpreted, and Unicode then just endorses this common
    practice, by enriching the very basic properties defined in ISO/IEC 10646.
    To make this endorsement more normative, any possible confusion MUST be
    explained by Unicode itself, otherwise this is not a standard, and people
    will continue to use characters the way they want, and the Unicode standard
    would not even be needed (ISO/IEC 10646 would be enough)!

    If I see only one strong positive argument in favor of Unicode it is exactly
    the set of additional normative properties that it adds to the ISO/IEC 10646
    standard. So if there's an error in the normative character name (this is
    not part of the Unicode standard itself, but part of ISO/IEC 10646) and this
    is immutable, then it is the role of Unicode of clarifying this. If it does
    not do that, then there's no standard for the affected characters, and any
    interpretation revealed by the normative character or by the representative
    glyph is correct.

    This archive was generated by hypermail 2.1.5 : Sat Oct 22 2005 - 08:22:49 CST