Re: [BULK] - Re: markup on combining characters

Date: Wed Sep 15 2004 - 09:08:51 CDT

  • Next message: Peter Kirk: "Re: Questions about diacritics"


    > A solution would be to specify in the markup which normalization to apply
    > to the combining sequence before refering to its component characters, with
    > some syntax like:
    > <font style="color:red nfd(2,1);">e&combining-acute;</font>
    > which would resist to normalization of the document such as NFC in:
    > <font style="color:red nfd(2,1);">&e-with-acute;</font>
    > Here some syntax in the markup style indicates an explicit NFD
    > normalization to apply to the plain-text fragment encoded in the text
    > element, before specifying a range of characters to which the style applies
    > (Here it says that color:red applies to only 1 character starting at the
    > second one in the surrounded text fragment, after it has been forced to NFD
    > normalization.
    > May be this seems tricky, but other simplified solutions may be implemented
    > in a style language, such as providing more basic restrictions using new
    > markup attributes:
    > <font style="combining-color:red">&e-with-acute;</font>
    > where the new "combining-color" attribute implies such prenormalization and
    > automatic selection of character ranges to which to apply coloring. May be
    > there are better solutions, that will not imply augmenting the style
    > language schema with lots of new attribute names, such as in:
    > <font style="color:combining(red)">&e-with-acute;</font>
    > Here also, Unicode itself is not affected. But markup languages and
    > renderers are seriously modified to take new markup property names or
    > values into account.

    I fear that none of the aforementioned solutions would work in many
    real-live situations.

    Let's start out with an arbitrary XML format where we want --- and are
    currently perfectly allowed --- to bring across the following message:


    No luck here when <emphasis>&combining_acute;</emphasis> is suddenly

    Now, we might restructure the file and add an attibute to emphasis:

      <emphasis style="color:combining(red)">e&combining_acute;</emphasis>

    I doubt that these two would be semantically equivalent, but let's
    presuppose for a moment that they are:

    We can now transform it via XSLT to HTML for presentation. What we
    will get is either:

        HEADER etc.
        <font style="color:combining(red)">e'</font> (with ' being the combining acute)


        HEADER etc.
        <font style="color:combining(red)">é</font>

    depending on the normalization that the XSLT processor performs. None
    that I know will preserve the character entity.

    Quite apart from the practical problem that a combining-color style
    would probably not be on the W3Cs CSS group's shortlist of priorities,
    this puts on the browser the obligation to perform the required
    normalization which would IMHO not be trivial to implement and very
    probably not be on the browser manufacturers' shortlist of priorities.

    Exactly the same holds true if you want to print the original file via

    Currently, e<font style="color:red;">`</font> (with ' being again the
    combining acute) or its XSL:FO equivalent at least displays more or
    less correctly in some browsers (I tried Konqueror 3.2.2 and IE 5.5)
    and XSL:FO renderers (I tried XEP 3.81).

    Best regards,


    Marc Wilhelm Küster

    This archive was generated by hypermail 2.1.5 : Wed Sep 15 2004 - 09:24:29 CDT