RE: Arabic letters separated by markup

From: James Kass (jameskass@att.net)
Date: Thu Jun 16 2005 - 20:33:23 CDT

Next message: Mark E. Shoulson: "Re: Arabic letters separated by markup"

Previous message: JFC (Jefsey) Morfin: "Re: Hexatridecimal"
Maybe in reply to: Andreas Prilop: "Arabic letters separated by markup"
Next in thread: Richard Wordingham: "Re: Arabic letters separated by markup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Jony Rosenne wrote,

>> Inserting mark-up tags between characters which would normally
>> ligate or shape or re-position breaks the run of text.
>
> I think that the high level protocol, such as HTML or CSS or XML, should
> define that.
>
> My reading of HTML and CSS is that inline markup does not break
> the run.

It probably shouldn't, for any text processing function other than display.

We don't expect characters to ligate if a graphic file is inserted between
them, we don't expect it if a line feed or paragraph is inserted. We don't
expect a ligated glyph to cross table boundaries.

A single ligature glyph can't be composed of components from two
different fonts. Changing the font by inserting a font-changing tag
breaks the run of text. This isn't limited to font-face, this also applies
to font style, like bold or italic. Try this at home: "<i>BACK</i>HOUSE"
or "<i>Falstaff</i>". In the first experiment, depending on the font, the
'K' probably overstrikes the 'H'. In the second experiment, depending on the
font, the final 'f' may overstrike the ending ASCII quote, since the 'f'
is in italics and the ASCII quote isn't. When an authour sees this effect, the
authour needs to take corrective measures regardless of what the specs say.
Even if kerning were supported in browsers, kerning data from one font
doesn't apply to another. Likewise for mark posititioning data, etc.

> The display engine should extract the plain text of the run, apply
> the relevant Unicode algorithms such as bidi, mirroring and Arabic shaping,
> and then apply the rich text decoration to the result as best as possible.

Agreed. But, inserting a tag indicates the the text run is completed
as far as the display engine is concerned. Subsequent text begins a
new run. Although I'm not quite sure what is meant by "inline markup",
any tag represents a boundary for text display processing purposes.

Suppose we ask the W3C to devise some kind of CSS syntax for
colouring portions of glyphs, assuming we can find some W3C
members who aren't too busy deprecating everything simple
and elegant about HTML.

Surely W3C could construct some over-complicated syntax for this,
perhaps involving counting pixels, converting them to font-units,
and dividing the results by the square root of pi or something, and
then specifying colour codes and x/y co-ordinates on a glyph-by-glyph
basis. Such an approach would be font-specific, though. If the visitor
lacks the author's installed font, the display could be skewed.

Perhaps it would be possible for the W3C to insist on enabling colour
changing tags between base character and mark character, though, as long
as the font is not being changed. If the ML specs should require such
a feature, I don't envy the programmers who make the browsers work.

Best regards,

James Kass

Next message: Mark E. Shoulson: "Re: Arabic letters separated by markup"
Previous message: JFC (Jefsey) Morfin: "Re: Hexatridecimal"
Maybe in reply to: Andreas Prilop: "Arabic letters separated by markup"
Next in thread: Richard Wordingham: "Re: Arabic letters separated by markup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jun 16 2005 - 20:54:37 CDT