Re: [BULK] - Re: markup on combining characters

From: Philippe Verdy (
Date: Fri Sep 10 2004 - 18:59:42 CDT

  • Next message: Peter Constable: "RE: Questions about diacritics"

    From: "Asmus Freytag" <>
    > On the other hand, all aspects to *coloring* of characters
    > do not belong in the plain text stream - but that was not
    > the question.
    > I think suggested solutions that define markup that apply to
    > combining characters but place that markup outside of the
    > combining sequence would be a better answer than protocols
    > trying to put markup inside the combining character sequence.
    > My personal take is that the UTC might make a recommendation
    > to that effect, but it's not part of the standard proper.
    > It's not clear that the issue has practical urgency - if
    > I should be mistaken on that, I'd like to find out how and why.

    Placing markup out of the combining sequence seems attractive, apparently,
    but exposes to other difficulties about how to refer to parts of combining
    sequences (I did not say "parts of characters", because I agree that
    combining characters are not part of characters, but effectively true
    abstract characters per the Unicode definition), when combining sequences
    are themselves subject to transformations like normalization.

    A solution would be to specify in the markup which normalization to apply to
    the combining sequence before refering to its component characters, with
    some syntax like:
        <font style="color:red nfd(2,1);">e&combining-acute;</font>
    which would resist to normalization of the document such as NFC in:
        <font style="color:red nfd(2,1);">&e-with-acute;</font>
    Here some syntax in the markup style indicates an explicit NFD normalization
    to apply to the plain-text fragment encoded in the text element, before
    specifying a range of characters to which the style applies (Here it says
    that color:red applies to only 1 character starting at the second one in the
    surrounded text fragment, after it has been forced to NFD normalization.

    May be this seems tricky, but other simplified solutions may be implemented
    in a style language, such as providing more basic restrictions using new
    markup attributes:
        <font style="combining-color:red">&e-with-acute;</font>
    where the new "combining-color" attribute implies such prenormalization and
    automatic selection of character ranges to which to apply coloring. May be
    there are better solutions, that will not imply augmenting the style
    language schema with lots of new attribute names, such as in:
        <font style="color:combining(red)">&e-with-acute;</font>
    Here also, Unicode itself is not affected. But markup languages and
    renderers are seriously modified to take new markup property names or values
    into account.

    This archive was generated by hypermail 2.1.5 : Fri Sep 10 2004 - 19:00:40 CDT