Re: Transcoding Tamil in the presence of markup

From: Martin Duerst (duerst@w3.org)
Date: Sun Dec 07 2003 - 12:47:30 EST

  • Next message: Martin Duerst: "Re: Fwd: Re: Transcoding Tamil in the presence of markup"

    At 23:16 03/12/07 +0900, Jungshik Shin wrote:

    >On Sun, 7 Dec 2003, Peter Jacobi wrote:

    > > So, I'm still wondering whether Unicode and HTML4 will consider
    > > <span style='color:#00f'>&#x0BB2;</span>&#x0BBE;
    > > valid and it is the task of the user agent to make the best out of it.
    >
    > I think this is valid.

    I agree. It is the task of the user agent to make the best out of it,
    and different user agents may currently do different things with it.
    Because this is related to rendering and styling, it seems to make
    sense that this is clarified in the CSS spec (either 2.1 or 3.0).

    >A more interesting case has to do with
    >W3 CHARMOD in which NFC is required/recommended (it's not yet complete
    >and W3C I18N-WG has been discussing it). Consider the following case.
    >
    > &#x0BB2;<span class="left_part">&#0x0BC7;</span>
    > <span class="right_part">&#0x0BBE;</span>
    >
    >Because <U+0BC7, U+0BBE> is equivalent to U+0BCB, we couldn't use
    >the above if NFC is required even though in legacy TSCII encoding,
    >it's possible.

    Yes, this is a bad idea. But there is Web technology that can do
    this (see below).

    The basic problem is that one has to draw the line somewhere.
    Sometimes, one would for example like to color the dot on an 'i'.
    In Unicode, it may theoretically be possible (with a dotless 'i'
    and a 'dot above' or some such), but it wouldn't be a real 'i'
    anymore.

    And there is of course a slippery slope. For example, consider
    the crossbar on a 't'. You can't color that, in any encoding.
    But a font designer may want to do that, for some instructional
    material, or may want to color all serifs in a font,...

    Similar examples exist in almost any other script. For most
    intents and purposes, most people are okay with what they
    can and can't do, but occasionally, we come close to the
    dividing line, and some of us are quite surprised. But somehow,
    we have to agree on what's a character and what's only a glyph,
    and we have to agree which combinations are canonically equivalent.

    >The same is true of Korean syllables(see below) as
    >Philippe pointed out.
    >
    > &#x1100;<span class="vowel">&#x1161;</span>&#x11a8;

    Yes. Korean is particularly difficult because it is the most
    logical, well-designed script in the world. It has more
    clearly identifiable hierarchical levels than any other
    script. It is very difficult to agree on which level
    characters should be.

    As an example, the vowel pairs a/ya, o/yo, u/yu, and so on
    are distinguished by changing from one small stroke to two
    small strokes. A Web page for children or foreigners may
    want to color these strokes separately. With the current
    encoding(s) in Unicode this is not possible, but I'm sure
    somebody has designed an encoding where this would be possible.

    So while this does not solve Peter's immediate problem,
    starting to change Unicode to color characters, glyphs,
    or character parts would be an extremely slippery slope.

    Working on better font technology seems to be much better
    suited to do the job. And such technology actually is
    already around. It's part of SVG. Chris Lilley had a
    very nice example once, but it got lost in a HD crash.
    Chris, any chance of getting a new example?

    SVG (http://www.w3.org/Graphics/SVG/ http://www.w3.org/TR/SVG11/)
    is the XML-based vector graphics format for the Web.
    Here is more or less how it works (as far as I understand it):

    In SVG Fonts (http://www.w3.org/TR/SVG11/fonts.html),
    SVG itself is used to describe glyph shapes. This means
    that all kinds of graphic features, including of course
    coloring, but also animation,... are available.
    But of course you don't want colors to be fixed.
    So glyphs in a font, or parts of glyphs, also allow
    the 'class' attribute. So you can mark glyphs or glyph
    components with things such as class='accent' or
    class='crossbar', and so on. The rendering of pieces
    in this class can then be controlled from a CSS
    stylesheet. (I hope I got the details right.)

    Regards, Martin.



    This archive was generated by hypermail 2.1.5 : Sun Dec 07 2003 - 13:36:00 EST