Re: Using combining diacritical marks and non-zero joiners in a name

From: Asmus Freytag (
Date: Fri Apr 18 2008 - 18:01:19 CDT

  • Next message: Jukka K. Korpela: "Re: Using combining diacritical marks and non-zero joiners in a name"

    On 4/18/2008 10:34 AM, Jukka K. Korpela wrote:
    > When you use a combining diacritic mark, programs may deal with it in
    > several ways:
    > 1) render the base character and the diacritic, positioned by the
    > principles outlined by the Unicode Consortium, aimed at producing good
    > quality for all possible combinations of base characters and diacritics
    > 2) render the base character and overprint it with the diacritic at a
    > fixed position, often resulting in poor or very poor presentation
    > 3) render the combination using a precomposed glyph, when available in a
    > font; note: the combination need not correspond to a precomposed
    > _character_
    > 4) internally convert the combination to a precomposed character (when
    > applicable) and render it.
    > Is there any reason why any of these would be _wrong_? Surely (2) means
    > poor quality, but I'd say it's just that, not incorrectness. And (4) is
    > something that the Unicode Standard fairly explicitly permits:
    > applications may well treat canonically equivalent sequences as the
    > same.

    I think there are several different levels at which one can answer your
    question. One is the narrow question of what behavior is conformant to
    the Unicode standard. We agree that all four of these forms of rendering
    are conformant (each of them treats the combining mark as a combining
    mark, thus satisfying the "interpretation" clause). Incidentally,
    there's a fifh method:

    5) render the base character and mark separately, but use relative
    positioning information present in the font.

    Now, from a typographical standpoint, some of these solutions are less
    satisfactory than others. There"s no written standard for the
    typography, so there's nothing you can claim conformance to, so when you
    call something "wrong", it has to be in the sense of being
    typographically so unsatisfactory as to be practically unacceptable.
    Method (2) seems to qualify for being "wrong" in that sense.

    However, method (1), which only relies on general principles for
    positioning, is merely better, and will, in some situations not produce
    acceptable results, either.

    The other three methods are clearly all typographically acceptable, but
    still may not produce the right results for some marks for some
    languages, unless you further allow them to give different results based
    on language.

    It's a deliberate limitation of Unicode conformance that it focuses its
    requirements on the *identity* of the character, not on the finer points
    of typography. In other words, the conformance seeks to ensure that
    writers know which characters to use to designate a combination of base
    and mark, and receivers know when they receive the data, which
    combination was intended. Whether the glyph they produce for this is a
    single one, or composite, looks pleasing or ugly, is typographically
    acceptable or not, that's left to another, typographic, dimension.

    And I tend to think that makes the standard stronger somehow.



    This archive was generated by hypermail 2.1.5 : Fri Apr 18 2008 - 18:04:21 CDT