Re: Generic base characters

From: Asmus Freytag (
Date: Sun Jul 15 2007 - 17:30:35 CDT

  • Next message: James Kass: "Re: Generic base characters"

    On 7/15/2007 1:44 PM, Kent Karlsson wrote:
    > Peter Constable wrote:
    >> Uniscribe inserts a dotted circle glyph only when the author has not
    >> included a valid base character for the mark.
    > There is always a base character for any non-empty sequence of combining
    > characters in a text. If there is no explicit one (it occurring at
    > beginning text or after a control character), NBSP is the implicit base
    > character. (It is probably best if rendering engines insert it during
    > rendering, to get consistent behaviour, esp. w.r.t. explicit NBSP in
    > the text.)
    Such a missing base character is a bug in the text. Despite the
    recommended fallback that you describe, the policy of making that
    visible to the author by inserting a dotted circle is, in principle,
    reasonable. Authors should not have an expectation of portably
    exchanging buggy text with perfect fidelity, so making them aware of the
    problem leads to more robust interchange.

    Now, there are several problems with this approach (depending on how it
    is implemented).

    If the policy leads to authors creating didactic texts that rely on the
    presence of the dotted circle, that is a problem.

    If the implementation prevents users from specifying some other
    reasonable base character, and insists to show a dotted circle
    nevertheless, that prevents users from creating reasonable texts,
    limiting the functionality of the implementation. Particularly egregious
    if an implementation prevents the user from providing a code point for
    the dotted circle explicitly.
    > There is no notion of "invalid"/"valid" base character for a combining
    > character in Unicode.
    But there is also no notion that an implementation has to support *all*
    sequences of characters. It is desirable to create implementations that
    don't get in the way of the users' needs, but in some cases, limiting
    the capabilities results in a more stable, more easily tested
    implementation that can deliver the *intended* support more correctly
    and at times also more cheaply.

    >> Perhaps you have in mind that a font developer should control what glyph
    >> is used in that situation, but I see a need, on the assumption that
    >> authors should, and normally are, explicitly intentional about what is in
    >> their document, and that Uniscribe's fallback rendering is just that: a
    >> fallback.
    > No, it is:
    >> A bug, which can be looked at.
    > No dotted circle is to be inserted by any rendering engine or by any font.
    > The only dotted circles to be rendered are those explicitly in the text.
    The problem with using 25CC is that it is *not* the dotted circle that
    is used as a base for combining characters in the standard. While it's
    name is "DOTTED CIRCLE", it was encoded to cover a symbol that differs
    in both size, weight, and details of line style, as well as perhaps
    vertical alignment from the true dotted circle used as a generic base.

    Fonts intending to support the sequence of geometric shapes, especially
    in context of compatibility with mathematical and technical symbol sets,
    would be ill-served by using the generic base character version.
    > /kent k

    This archive was generated by hypermail 2.1.5 : Sun Jul 15 2007 - 17:35:05 CDT