RE: Generic base characters

From: Kent Karlsson (
Date: Mon Jul 16 2007 - 11:18:00 CDT

  • Next message: Kent Karlsson: "RE: Generic base characters - From Phetsarath Lao font"

    Asmus Freytag wrote:
    > Such a missing base character is a bug in the text.

    Yes, but...

    > Despite
    > the recommended fallback that you describe, the policy of
    > making that visible to the author by inserting a dotted
    > circle is, in principle, reasonable.

    1) I'm not so sure about that. It's better to have a single defined
    behaviour (assuming the characters in question are at all supported).

    2) NBSP base is for sequences of combining characters preceeded
    by beginning of string or by a control char. I think using NBSP
    as the implicit base in such cases is a reasonable behaviour.
    (Inserting a dotted circle is not.)

    3) This thread started talking about there actually being a base
    present in the text just before the combining sequence, just that
    the base was in another script (or some symbol/punctuation).
    That is not an error case from a text rendering point of view.
    There is no reason to start inserting dotted circle, NBSP,
    or anything else. Ligation, kerning, postioning adjustments
    are unlikely to work except for special cases, but some rough
    approximate (assuming again that the individual characters at
    all are supported by the rendering system and font used) should
    be output.

    > Authors should not have
    > an expectation of portably exchanging buggy text with perfect

    What is "buggy text" (from a rendering engine+font point of view)?
    Can you give me an example in English? Writing moooose may be an
    error (or maybe not), but the rendering engine+font should not care.
    It may be a spelling error, but it is in no way "buggy text" at
    the rendering level. Writing <DEVANAGARI LETTER MA, DEVANAGARI
    may be an error (or maybe not), but the rendering engine+font
    should not care. It may be a spelling error, but it is in no way
    "buggy text" at the rendering level. There is a problem with
    above/below combining characters in that proper stacking will
    quickly go outside of the line or even page boundary. But that
    is a problem of a different kind, and not buggy text per se.

    > fidelity, so making them aware of the problem leads to more
    > robust interchange.

    Which problem. Indicating such things as spelling errors is not
    the business of the low level text renderer.

    > Now, there are several problems with this approach (depending
    > on how it is implemented).
    > If the policy leads to authors creating didactic texts that
    > rely on the presence of the dotted circle, that is a problem.
    > If the implementation prevents users from specifying some
    > other reasonable base character, and insists to show a dotted
    > circle nevertheless, that prevents users from creating
    > reasonable texts, limiting the functionality of the

    A rendering system should have no opinion on what text is
    "reasonable" or not. I can understand that items that would
    be 100% confusable and that *should have been* canonically
    equivalent (but aren't) one of the representations result
    in blurred text (in some way, like some error glyphs being used).
    But that needs to be defined in the standard so that everyone
    makes the same choice of which *should-have-been* equivalent
    to blurr out, and which to show without blurring. But that is
    not the case for say <underline, any combining visible character>,
    nor for <latin letter small x, any combining visible character>.
    Nor for <lao consonant, lao combining vowel, lao combining vowel,
    lao combining vowel>.

    However, <LAO VOWEL SIGN E, LAO VOWEL SIGN E> is 100% confusable
    with LAO VOWEL SIGN EI, but they are not canonically equivalent
    (as they should have been) so one needs to be blurred to avoid
    spoofing possibilities. But neither of these characters are

    > implementation. Particularly egregious if an implementation
    > prevents the user from providing a code point for the dotted
    > circle explicitly.
    > > There is no notion of "invalid"/"valid" base character for
    > a combining
    > > character in Unicode.
    > >
    > But there is also no notion that an implementation has to
    > support *all* sequences of characters. It is desirable to
    > create implementations that don't get in the way of the
    > users' needs, but in some cases, limiting the capabilities
    > results in a more stable, more easily tested implementation
    > that can deliver the *intended* support more correctly and at
    > times also more cheaply.

    I'm not sure exactly what this refers to. But I understand that
    not everything can be tested. Does spurious dotted circles mean
    "we did not test this"?

                    /kent k

    This archive was generated by hypermail 2.1.5 : Mon Jul 16 2007 - 11:20:03 CDT