Complex Combining

From: Arcane Jill (arcanejill@ramonsky.com)
Date: Thu Nov 27 2003 - 10:14:42 EST

  • Next message: Arcane Jill: "Decimal digit property - What's it for?"

    Hmmm.

    I still like the "invisible brackets" idea. That would make the
    precedence explicit. As in:

        INVISIBLE_LEFT_BRACKET + "9" + "2" + INVISIBLE_RIGHT_BRACKET +
        COMBINING_ENCLOSING_CIRCLE

    Totally unambiguous, and would work for /all/ modifiers, not just
    enclosing circle. (You could also use invisible brackets to
    unambiguoulsy reorder combiners!)
    Of course, it would mean the addition of two new characters to Unicode.

    Would it work? Are there problems I haven't thought of?

    Jill

     -----Original Message-----
    From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]
    Sent: Thursday, November 27, 2003 1:01 PM
    To: Arcane Jill
    Cc: Unicode@Unicode.Org
    Subject: RE: numeric properties of Nl characters in the UCD

    Arcane Jill writes:

    > Gotcha. It's all starting to make sense now. Including the opposition
    to hex.
    >
    > Maybe one could make "circled 92" in two stages:
    > (1) create a glyph representing 92, then (2)
    > apply an enclosing circle modifier to it.
    >
    > Except of course, that wouldn't work!
    > Because a modifier only affects a single base character.

    This is true if the base character is not linked with other preceding
    characters by something like ZWJ which creates a ligature opportunity
    (but ZWJ offers no guarantee that the ligature or junction will be
    effectively applied on rendering, and does not affect the semantic of
    text, as it is just a formating control).

    > Basically, you'd need to do: encircle( "9" + "2")
    > instead of: "9" + encircle("2")

    You're right here: the simple concatenation with + is not intended to
    extend the semantic of the separate encircle() transformation function.

    i.e. if ZWJ was effectively creating a "semantic" ligature:

        encircle(<DIGIT NINE, DIGIT TWO>)
        ~~ encircle(<DIGIT NINE, ZWJ, DIGIT TWO>)
        ~~ <DIGIT NINE, ZWJ, DIGIT TWO, COMBINING ENCLOSING CIRCLE>

    or more consistently (more complicate to implement in a encircle()
    function, but probably simpler to parse and render correctly by noting
    that the two combining sequences on each side of ZWJ both have a common
    "encircled" rendering property, which could then be "factorized" when
    looking up for the range of characters to which the enclosing property
    should be applied):

        == <DIGIT NINE, COMBINING ENCLOSING CIRCLE, ZWJ, DIGIT TWO,
    COMBINING ENCLOSING CIRCLE>

    But I note that this is not the way the character model was defined.
    Particularly, we have the case of "double" diacritics, currently coded
    as (for example):

        <base letter 1, DOUBLE TILDE, base letter 2>

    and not simply as:

        <base letter 1, TILDE, ZWJ, base letter 2, TILDE>

    as if it was the result of the function:

        tilde(<base letter 1> + <base letter 2>)

    So for arbitrary encircled numbers, what would be needed is a "DOUBLE
    ENCLOSING CIRCLE" diacritic (currently not encoded in Unicode, except
    with PUA) like this:

        encircle(<DIGIT 9, DIGIT 2>)
        == <DIGIT 9, DOUBLE ENCLOSING CIRCLE, DIGIT 2>

    Or for arbitrary numbers:

        encircle(<DIGIT 9, DIGIT 2, DIGIT 3, DOT, DIGIT 0>)
        == <DIGIT 9, DOUBLE ENCLOSING CIRCLE,
            DIGIT 2, DOUBLE ENCLOSING CIRCLE,
            DIGIT 3, DOUBLE ENCLOSING CIRCLE,
            DOT, DOUBLE ENCLOSING CIRCLE,
            DIGIT 0>

    Here you don't have any ZWJ character, that's the double diacritic which
    creates explicitly the ligature between the previous and next base
    character.

    All these solutions are not specified in the standard. This is a pure
    convention of use of Unicode, and until there's some enhancement
    published in the Unicode character model, to clearly create ranges of
    characters on which diacritics can be applied, without the too simple
    ZWJ control, this interpretation of such encoded text will remain
    application-dependant.



    This archive was generated by hypermail 2.1.5 : Thu Nov 27 2003 - 10:58:18 EST