RE: Generic base characters - From Phetsarath Lao font

From: Brian Wilson (
Date: Mon Jul 16 2007 - 08:47:35 CDT

  • Next message: Peter Constable: "RE: Generic base characters - From Phetsarath Lao font"

    I probably have this all wrong, but aren't there 65,536 possible characters in Unicode?

    Why not have a section of 48 characters for generic bases. Encode the 10 characters that John Hudson recommends. All of the generic bases would be in one section of unicode and there would be plenty of room for expansion. That saves us ignorant people from wondering, "now which 'x-like' symbol do I use for Lao again"?

    Brian Wilson

    -----Original Message-----
    From: [] On Behalf Of James Kass
    Sent: Monday, July 16, 2007 6:21 AM
    To: Asmus Freytag
    Cc: 'Unicode List'
    Subject: Re: Generic base characters

    Asmus Freytag wrote,

    > The problem with using 25CC is that it is *not* the dotted circle that
    > is used as a base for combining characters in the standard. While it's
    > name is "DOTTED CIRCLE", it was encoded to cover a symbol that differs
    > in both size, weight, and details of line style, as well as perhaps
    > vertical alignment from the true dotted circle used as a generic base.

    Two related issues.

    1) Fallback rendering of unexpected isolated combining marks.
    2) An author entering desired generic bases plus combining marks
    in plain text for illustrative/informative purposes.

    Fallback rendering is up to the font engine.

    A listing of Unicode characters suitable for use as generic base
    characters, such as John Hudson suggested, might, among other
    purposes, be used as a guideline for localization of operating systems.

    An unexpected isolated combining mark occurs when a font engine
    encounters a sequence which it does not support. So, the font
    engine needs to support the entire listing in order to avoid treating
    such combinations as unexpected. The engine should not insert
    undesired fallback display behavior when an author has specifically
    encoded a desired form.

    For the generic base glyph which resembles a plus sign, would it be
    better to have a new, dedicated character, or to choose one from
    the similar signs already encoded? (examples: +˖ᐩ⁺₊⊕⊞⧾⨁⨹﬩+)

    (Plus signs which already have diacritics [⨢⨣⨤⨥⨦] should probably be
    excluded from consideration.)

    John indicated that there are possibly fewer than ten attested
    shapes used as generic bases. Why not encode them as characters in
    their own right? After all, what's another plus sign, more or less?

    Best regards,

    James Kass

    This archive was generated by hypermail 2.1.5 : Mon Jul 16 2007 - 08:51:14 CDT