From: Asmus Freytag (firstname.lastname@example.org)
Date: Fri Apr 18 2008 - 18:01:19 CDT
On 4/18/2008 10:34 AM, Jukka K. Korpela wrote:
> When you use a combining diacritic mark, programs may deal with it in
> several ways:
> 1) render the base character and the diacritic, positioned by the
> principles outlined by the Unicode Consortium, aimed at producing good
> quality for all possible combinations of base characters and diacritics
> 2) render the base character and overprint it with the diacritic at a
> fixed position, often resulting in poor or very poor presentation
> 3) render the combination using a precomposed glyph, when available in a
> font; note: the combination need not correspond to a precomposed
> 4) internally convert the combination to a precomposed character (when
> applicable) and render it.
> Is there any reason why any of these would be _wrong_? Surely (2) means
> poor quality, but I'd say it's just that, not incorrectness. And (4) is
> something that the Unicode Standard fairly explicitly permits:
> applications may well treat canonically equivalent sequences as the
I think there are several different levels at which one can answer your
question. One is the narrow question of what behavior is conformant to
the Unicode standard. We agree that all four of these forms of rendering
are conformant (each of them treats the combining mark as a combining
mark, thus satisfying the "interpretation" clause). Incidentally,
there's a fifh method:
5) render the base character and mark separately, but use relative
positioning information present in the font.
Now, from a typographical standpoint, some of these solutions are less
satisfactory than others. There"s no written standard for the
typography, so there's nothing you can claim conformance to, so when you
call something "wrong", it has to be in the sense of being
typographically so unsatisfactory as to be practically unacceptable.
Method (2) seems to qualify for being "wrong" in that sense.
However, method (1), which only relies on general principles for
positioning, is merely better, and will, in some situations not produce
acceptable results, either.
The other three methods are clearly all typographically acceptable, but
still may not produce the right results for some marks for some
languages, unless you further allow them to give different results based
It's a deliberate limitation of Unicode conformance that it focuses its
requirements on the *identity* of the character, not on the finer points
of typography. In other words, the conformance seeks to ensure that
writers know which characters to use to designate a combination of base
and mark, and receivers know when they receive the data, which
combination was intended. Whether the glyph they produce for this is a
single one, or composite, looks pleasing or ugly, is typographically
acceptable or not, that's left to another, typographic, dimension.
And I tend to think that makes the standard stronger somehow.
This archive was generated by hypermail 2.1.5 : Fri Apr 18 2008 - 18:04:21 CDT