RE: numeric properties of Nl characters in the UCD

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Nov 27 2003 - 08:00:43 EST

  • Next message: Arcane Jill: "Complex Combining"

    Arcane Jill writes:

    > Gotcha. It's all starting to make sense now. Including the opposition to
    hex.
    >
    > Maybe one could make "circled 92" in two stages:
    > (1) create a glyph representing 92, then (2)
    > apply an enclosing circle modifier to it.
    >
    > Except of course, that wouldn't work!
    > Because a modifier only affects a single base character.

    This is true if the base character is not linked with other preceding
    characters by something like ZWJ which creates a ligature opportunity (but
    ZWJ offers no guarantee that the ligature or junction will be effectively
    applied on rendering, and does not affect the semantic of text, as it is
    just a formating control).

    > Basically, you'd need to do: encircle( "9" + "2")
    > instead of: "9" + encircle("2")

    You're right here: the simple concatenation with + is not intended to extend
    the semantic of the separate encircle() transformation function.

    i.e. if ZWJ was effectively creating a "semantic" ligature:

            encircle(<DIGIT NINE, DIGIT TWO>)
            ~~ encircle(<DIGIT NINE, ZWJ, DIGIT TWO>)
            ~~ <DIGIT NINE, ZWJ, DIGIT TWO, COMBINING ENCLOSING CIRCLE>

    or more consistently (more complicate to implement in a encircle() function,
    but probably simpler to parse and render correctly by noting that the two
    combining sequences on each side of ZWJ both have a common "encircled"
    rendering property, which could then be "factorized" when looking up for the
    range of characters to which the enclosing property should be applied):

            == <DIGIT NINE, COMBINING ENCLOSING CIRCLE, ZWJ, DIGIT TWO,
    COMBINING ENCLOSING CIRCLE>

    But I note that this is not the way the character model was defined.
    Particularly, we have the case of "double" diacritics, currently coded as
    (for example):

            <base letter 1, DOUBLE TILDE, base letter 2>

    and not simply as:

            <base letter 1, TILDE, ZWJ, base letter 2, TILDE>

    as if it was the result of the function:

            tilde(<base letter 1> + <base letter 2>)

    So for arbitrary encircled numbers, what would be needed is a "DOUBLE
    ENCLOSING CIRCLE" diacritic (currently not encoded in Unicode, except with
    PUA) like this:

            encircle(<DIGIT 9, DIGIT 2>)
            == <DIGIT 9, DOUBLE ENCLOSING CIRCLE, DIGIT 2>

    Or for arbitrary numbers:

            encircle(<DIGIT 9, DIGIT 2, DIGIT 3, DOT, DIGIT 0>)
            == <DIGIT 9, DOUBLE ENCLOSING CIRCLE,
                DIGIT 2, DOUBLE ENCLOSING CIRCLE,
                DIGIT 3, DOUBLE ENCLOSING CIRCLE,
                DOT, DOUBLE ENCLOSING CIRCLE,
                DIGIT 0>

    Here you don't have any ZWJ character, that's the double diacritic which
    creates explicitly the ligature between the previous and next base
    character.

    All these solutions are not specified in the standard. This is a pure
    convention of use of Unicode, and until there's some enhancement published
    in the Unicode character model, to clearly create ranges of characters on
    which diacritics can be applied, without the too simple ZWJ control, this
    interpretation of such encoded text will remain application-dependant.

    __________________________________________________________________
    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE! http://www.ellaforspam.com





    This archive was generated by hypermail 2.1.5 : Thu Nov 27 2003 - 09:55:51 EST