RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Dec 11 2003 - 05:57:17 EST

  • Next message: Peter Kirk: "Re: Text Editors and Canonical Equivalence (was Coloured diacritics)"

    Christopher John Fynn wrote:
    > Peter Kirk wrote:
    > >Consider the following:
    > > (1) <span class="black-text">{U+00E9}</span>
    > > (2) <span class="black-text">e{U+0301}</span>
    > > (3) <span class="black-text">e<span
    > > class="black-text">{U+0301}</span></span>
    > > (4) <span class="black-text">e<span
    > > class="red-text">{U+0301}</span></span>
    > >
    > > I would expect (1), (2) and (3) to be rendered identically, and (4) to
    > > differ only in the colour of the accent, just as it would be (apart from

    > > (1) if U+0301 were replaced by a regular letter. I am assuming nothing
    > > special defined in the CSS - the behaviour should be the same with a
    > > simple colour attribute. And so I would expect the behaviour of an
    > > in-line span element to be subtly different from its normal behaviour
    > > when the text starts with a combining mark. I think this is what any
    > > naive user would expect in the circumstances, and is also what
    > > is sensible.
    >
    > Problems are still going to arise if properties other than colour differ
    > between the styles "black text" & "red-text". I don't think it is good
    > practice to introduce mark-up between a simple character and a combining
    > character dependant on it.

    I also agree, but there are some occasions where such markup added in the
    middle of combining sequences are unavoidable (notably when adding meta-data

    markup for example in revision notices. But I agree that if the markup is
    intended to affect the rendering (including foreground/background colors),
    this should be avoided. combining sequences are intended to be rendered as
    a whole if they are in a document whose purpose is to be rendered (such as
    HTML files), but this does not apply to documents which are mostly
    structured
    data (such as XML data files, for example created when dumping out the
    content
    of a database table, or in data communication protocols based on XML for its

    transmission syntax, where even defective combining sequences are valid data

    and should also be valid Unicode plain text when the datatype of fields are
    really Unicode text).

    __________________________________________________________________
    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE! http://www.ellaforspam.com





    This archive was generated by hypermail 2.1.5 : Thu Dec 11 2003 - 06:41:45 EST