Re: Dotted Circle plus Combining Mark as Text

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Sat, 26 Oct 2013 00:41:55 +0200

That's exctly why I asked bout how to encode unmbiguously in text tht we
relly want to represent a semantically defective combining sequence (which
should be then renderd depending on cultural encironment like language
tagging in text, or the scripts for which the diacritic is encoded).

I suggested WJ for this usage (technically, appending a combining mark
after WJ mkes that it is no longer defective, but WJ effectively blocks
reorderings by normalisations, nd remins neutrl for plain text searches,
without lso introducing ny new brek opportunity). My optinoon is tht it is
the best "replacement" for the missing base letter or symbol.

Then at rendering time, the renderer will compose lines using wordbreks
where possible (but not here becuse WJ behves like a letter in word). It
will still apply the Bidi algorithm and WJ will be ordered by keeping it
with its following diactitics, so they won't be broken i the middle by
bidi-reordering.

Then WJ will be discarded (or treated as a zero-width glyph), and the
renderer will hve to look at the cluster strting by the combining mark; as
it is ill-formed, it will know that it can render it safely with a base
glyph (like dotted circle). It should then look for an OpenType feature
"ilfm" (ill-formed marks" to see which glyph to use for the leading
combining mark treated like a spacing mark).

WJ will not be needed at begining of paragraphs, but should cause no other
problem. It will never be rendered by itself (except in a "visible controls
edit mode" where it would show by itself, followed by ech separtely encoded
diacritic rendered in their "ill form" as below, and not combined together,
i.e. without ligtures or substitutions of pairs by one glyph, or contextual
subtitutions of isolates).

That OpenType feature could be designed in two ways in fonts:
- it could specify mappings contextual for ranges of combining marks to
substitute them with an *inserted* appropriate base glyph (not necessrily
the same ad the one mpped for U+25CC, it could be a dotted arabic sharadah
for example, or a dotted "x" in Thai)
- it could also just specify the single glyph to use for U+25CC rendered
with this feature enabled (easier for mny font authors that don't need
fonts covering multiple scripts or cultures): the glyph will be then
different from the default glyph used for U+25CC in encoded texts.
- if the font does not contain this feature, the OpenType renderer will use
any mapping of U+25CC (including from other fonts) and will just use best
efforts to positian the dicritic on it (but fonts my still contain specific
substitution mappings for encoded <U+25CC, diacritic> pairs.

The feature would work in "automatic" mode (off by default, except in
contexts of ill-formed mrks where it will be activted by the renderer,
unless the document specifies tht it should be turned compltely off)

Documents authors that still wnt to use a specific "base" character can
still use it *instead* of WJ (to create non defective sequences):

- whitespaces and cursive joiners
- dashes and hyphens
- U+25CC or other geometric
- multiplication sign, or other (maths) symbols
- Latin letter x (not recommended due to its strong LTR property, and
effects of word breakers working by runs of the same script), etc.

But such technic may not work in many fonts that will provide mappings for
(base,diacritic) pairs only for a few wellknown "bases":

- NBSP, U+25CC, and
- WJ (preferably via the OpenType "ilfm" feature)

2013/10/25 Lorna Evans <lorna_evans_at_sil.org>

> I'm sending this on behalf of my coworker Sharon Correll:
>
> We've run into this dotted circle problem on our ScriptSource website
> (scriptsource.org) where we are trying to display diacritics from many
> scripts in isolation, using a wide variety of fonts, eg:
> http://scriptsource.org/block/**0980 <http://scriptsource.org/block/0980>.
> I wrote a blog post about the problem:
>
> http://scriptsource.org/entry/**vr82t8n6pt<http://scriptsource.org/entry/vr82t8n6pt>
>
> We've seen single dotted circles, double dotted circles, no dotted
> circles, and even no diacritics at all. There is very little
> consistency, so it seems like some interaction between the
> font-smarts/rendering-engine/**browser. I didn't investigate deeply
> enough to figure out which, but bottom line, there doesn't seem to be
> a good across-the-board solution.
>
> Solving it on a script-by-script basis is not ideal, but we might bite
> the bullet and do that if we could figure out something that would
> actually give reasonable results.
>
> Sharon Correll
> SIL International
>
>
>
Received on Fri Oct 25 2013 - 17:44:28 CDT

This archive was generated by hypermail 2.2.0 : Fri Oct 25 2013 - 17:44:28 CDT