Re: Caret

From: Philippe Verdy <>
Date: Thu, 29 Nov 2012 15:39:03 +0100

Another complication : it is pobably possible to style components of a
ligature (or any sequence of characters where glyph substituion/position is
expected to occur) with just distinct colors or backgrounds, using such
technics based on component glyph metrics, as long as style does not change
the font selection (e.g. when styling a component in bold or italic, or
changing the font size, or even adding text-decorations like underlining :
the ligature gets broken now).... Blinking or shadowing effects may still
be generated though.

* <span style="color:black">ef<span style="color:blue">f</span><span
  (still showing the ffi ligature but each ligatured letter still colored

* <span style="color:black">ef<b>f</b>icient.</span>
  (the ffi ligature is broken by boldening the central f, which changes the
font selection.)

But in HTML it is not expected that it will work because you should need to
split combining sequences into separate spans for its components, creating
defective sequences. And several HTML validators are used to detect
defective sequences (which also cause another problem with normalization of
the HTML source code). But may be you could make these sequences non
defective by leading them by a CGJ in your HTML source (there is still an
incorrect delimitation of combining sequences if you just look at the HTML
source, not dramatic because the HTML parser ignores the delimitation of
combining sequences, but at least normalization of the source document is
no longer an issue) :

* <span style="color:black">cafe<span

Then this CGJ is dropped automatically (normalization remains possible of
the successive spans of text, long after HTML parsing in DOM and selection
of fonts, but before computing BiDi ordering, and before processing glyphs
within the text renderer and never part of the glyph processing. But I
wonder if a text search of "café' in the HTML browser will match if there's
a CGJ in the middle (this is a problem of how collation is performed
because normalization does not remove this CGJ).

(using a zero-width space or other joiner control as the holder would break
the apparition of the ligature or combining sequence).

This still complicates the renderers because they need to process all spans
that are part of the same line and using the same font selection, in a
single sequence.

Without CGJ, and if there's no normalization issues when no holder is
inserted, these defective sequences should still not trigger the insertion
of a spacing dotted circle glyph to hold the each defective sequence in the
rendered HTML.

But when viewing/editing the HTML source, the sequences interrupted by the
insertion of the markup are not defective as they seem to combine with a
'>' character, An HTML/XML aware may need to apply 'syntax coloring" for
this '>' (using a distinct font) to avoid such combination to occur, so
that the sequence becomes defective and a dotted circle appears.

2012/11/29 QSJN 4 UKR <>

> Philippe Verdy:
> "And what about applying separate styles on components of a cluster (e.g.
> different color to an acute accent) : the difficulty is even worse due to
> selection of fonts and the way text renderers are selecting glyphs in fonts
> and positioning/substituting them (it does not work if glyphs are in
> distinct fonts or if sequences are only rendered correctly by fonts
> performing substitutions)."
> Usual realization of ligature feature is
> sub f i by f_i;
> while more usable for editing, for caret positioning, for the components
> of cluster stupid styling is
> sub f before i by f.ligaleft;
> sub i after f by i.ligaright;
> pos f.ligaleft i.ligaright…;
> i. e. never use ligature substitution and to use decompositions wherever
> the user wish. Different fonts — different possibilities :(
Received on Thu Nov 29 2012 - 08:41:41 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 29 2012 - 08:41:41 CST