Re: Request clarification on disunification based on different character properties

From: verdy_p (
Date: Sun Sep 06 2009 - 10:50:20 CDT

  • Next message: Chris Fynn: "Re: Run-time checking of fonts for Sinhala support"

    "Asmus Freytag" wrote:
    > Like Mark pointed out, if it is a standard letter / digit of a script,
    > Indic or not, then any more or less accidental (or historical) shape
    > similarities/equivalences are usually not obstacles to disunification.

    It's true that the type of the character plays an important role for admitting its desunification: if it's a letter
    or digit (or a diacritic used in combination with a letter or digit), it merits a disunification if it is used in a
    script that is already disunified from another script (note that here, I consider the mathematic notations as a
    distinct script from Latin/Greek... even if the symbols are similar: it's something that is needed for the
    consistency of the notation, which has more strict composition rules and identification constraints than the regular
    scripts used to write humane languages, where some style differences are just considered as variable style which may
    change freely without chacing the meaning of the underlying text).

    But if it's a punctuation sign (like dandas in Indic scripts), or a symbol (like currencies), if tends to keep its
    unification, unless the current character has some wrong properties, notably for: its directionality, or other
    layout properties within composition squares for East Asian scripts that may also be rendered vertically, such as
    fullwidth or line-break properties, as it would influence the encoding or non-encoding within plain texts of extra
    spaces or format controls around this character).

    The glyph similarity, or even just its apparent semantic, is not the only factor considered: maintaining the logic
    of the script in which the character will be used is an important and useful feature.

    So if your Indic character is similar to another character from anothe Indic script (from which it was probably
    borrowed), it should be disunified with the existing one if the character is a letter or digit, and it merits its
    own encoding, to work best within the rendering rules of the target script: that's why some zero digits were added
    later to other Indic scripts that did not have it initially, but also why dandas were not disunified and are shared
    by several distinct Indic scripts.

    This archive was generated by hypermail 2.1.5 : Sun Sep 06 2009 - 10:54:59 CDT