Colouring combining Marks (was: unicode Digest V5 #149)

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Jun 19 2005 - 05:37:14 CDT

  • Next message: Peter Kirk: "Re: unicode Digest V5 #149"

     Patrick Andries wrote:

    > And this is why it should not be possible to use these techniques in
    > contextualized or cursive texts with modern days fonts (or cursors
    > apparently for Tamil split vowels whose colour one would want to
    > change to highlight them by first selecting them which is often not
    > possible)?

    Doug Ewell wrote:

    > No, this is why it is not a Unicode problem.

    We had a partial explanation from a Freetype developer. It seems that
    complex layout just becomes too complex if colour (and other things -
    perhaps colour alone could be handled) has to be handled as well. (Word may
    have solved this problem, but I don't know in what common fonts/versions. I
    can't repeat John Hudson's success, and I don't know what the critical issue
    is - Word version, font, or Uniscribe version.) There are two aspects which
    make this a Unicode issue:

    (1) It is yet another problem for which Unicode will be blamed - the problem
    goes away if you use typewriter order-based fonts, but these will generally
    not be Unicode-encoded, and the text, stripped of mark-up, will not, for
    Tamil, have Unicode semantics.

    (2) There seems to be a school of thought that one should not put XML-type
    mark-up round individual combining marks. Unfortunately, I'm not sure that
    one can reasonably hope to have the sort of mark-up available for grapheme
    clusters that will affect a specific combining character.

    I admit that both aspects are peripheral to Unicode, but they are not
    totally unrelated. The first is 'marketing', and in the second case it
    could lead to a call for a special base character that would normally be
    deleted as the text either side of the mark-up was spliced together, but
    would occasionally (e.g. at the start of a line) be treated like
    non-breaking space (U+00A0) or the dashed circle Uniscribe inserts for most
    Brahmic scripts when base characters are missing. I hope one would not need
    such a character for each script!

    This whole topic did come up in December 2003 (
    http://www.unicode.org/mail-arch/unicode-ml/y2003-m12/0370.html
     ). Jon Hanna suggested the use of SVG fonts, but unfortunately the link he
    suggested in http://www.unicode.org/mail-arch/unicode-ml/y2003-m12/0480.html
    on 9 December, namely, <http://www.w3.org/TR/charmod/benoit.svg>, has
    vanished.

    As an aside, I'm beginning to get confused by the 'order' terminology. I
    use to assume that visual order was the(?) orderly order the eye would
    follow when reading, but that does not seem to be so for RTL scripts! If I
    want the first sense, do I have to say typewriter order for RTL scripts? Is
    'typewriter order' appropriate for the jamos of hangul? I think I
    understand phonetic order, but there does seem to be special licence for
    out-of-order aspirates, as in Scottish, Irish and American English 'what'
    and I think the Burmese subscript 'h', in addition to the general licence
    for phonetic change and subsequent rule modification, as in English 'make'.

    I suspect 'logical order' really just means, 'the order we like'. I'd be
    interested to see a robust scheme for logical order in Thai, which
    'Cleanicode' apparently requires. I'd also be interested to learn (from
    Peter Constable?) when Thai collation order was made independent of
    syllabification and thus amenable to computerisation.

    Richard.



    This archive was generated by hypermail 2.1.5 : Sun Jun 19 2005 - 05:38:44 CDT