RE: Arabic letters separated by markup

From: James Kass (
Date: Thu Jun 16 2005 - 20:33:23 CDT

  • Next message: Mark E. Shoulson: "Re: Arabic letters separated by markup"

    Jony Rosenne wrote,
    >> Inserting mark-up tags between characters which would normally
    >> ligate or shape or re-position breaks the run of text.
    > I think that the high level protocol, such as HTML or CSS or XML, should
    > define that.
    > My reading of HTML and CSS is that inline markup does not break
    > the run.

    It probably shouldn't, for any text processing function other than display.

    We don't expect characters to ligate if a graphic file is inserted between
    them, we don't expect it if a line feed or paragraph is inserted. We don't
    expect a ligated glyph to cross table boundaries.

    A single ligature glyph can't be composed of components from two
    different fonts. Changing the font by inserting a font-changing tag
    breaks the run of text. This isn't limited to font-face, this also applies
    to font style, like bold or italic. Try this at home: "<i>BACK</i>HOUSE"
    or "<i>Falstaff</i>". In the first experiment, depending on the font, the
    'K' probably overstrikes the 'H'. In the second experiment, depending on the
    font, the final 'f' may overstrike the ending ASCII quote, since the 'f'
    is in italics and the ASCII quote isn't. When an authour sees this effect, the
    authour needs to take corrective measures regardless of what the specs say.
    Even if kerning were supported in browsers, kerning data from one font
    doesn't apply to another. Likewise for mark posititioning data, etc.

    > The display engine should extract the plain text of the run, apply
    > the relevant Unicode algorithms such as bidi, mirroring and Arabic shaping,
    > and then apply the rich text decoration to the result as best as possible.

    Agreed. But, inserting a tag indicates the the text run is completed
    as far as the display engine is concerned. Subsequent text begins a
    new run. Although I'm not quite sure what is meant by "inline markup",
    any tag represents a boundary for text display processing purposes.

    Suppose we ask the W3C to devise some kind of CSS syntax for
    colouring portions of glyphs, assuming we can find some W3C
    members who aren't too busy deprecating everything simple
    and elegant about HTML.

    Surely W3C could construct some over-complicated syntax for this,
    perhaps involving counting pixels, converting them to font-units,
    and dividing the results by the square root of pi or something, and
    then specifying colour codes and x/y co-ordinates on a glyph-by-glyph
    basis. Such an approach would be font-specific, though. If the visitor
    lacks the author's installed font, the display could be skewed.

    Perhaps it would be possible for the W3C to insist on enabling colour
    changing tags between base character and mark character, though, as long
    as the font is not being changed. If the ML specs should require such
    a feature, I don't envy the programmers who make the browsers work.

    Best regards,

    James Kass

    This archive was generated by hypermail 2.1.5 : Thu Jun 16 2005 - 20:54:37 CDT