Re: Dotted Circle plus Combining Mark as Text

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Tue, 22 Oct 2013 01:43:47 +0100

On Tue, 22 Oct 2013 01:40:39 +0200
Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:

> You still don't undestand: I want the composite to behave as if it
> was a letter that is missing and it is supposed to replace (including
> in the middle of a word... There's no attempt to insert a line break
> (in fact I don't want it before or after, unless there are breaking
> characters around such as punctuation or spaces).

By almost all that's in the Unicode standard, placeholder base
character plus combining mark (2 characters in total) should render as
though the placeholder were a letter. No control character should be
necessary - gluing them together with WJ would not improve things. The
only exception is that TUS cautions that some combinations may not
render well.

I tried to think where WJ might make sense between a base character and
combining mark. I could think of only two cases, and in these cases it
would apply even within normal words. The first is where a hyphenator,
trying to improve line-breaking, decides to insert a line break visually
between a base and spacing combing mark (Mc). Such breaks do occur. A
WJ might overrule this behaviour.

The second is where text is being split into words, e.g. Sanskrit
text. Now a WJ would not always be able to help, for sometimes a vowel
character must be split between two words.

I've just looked at the Uniscribe behaviour in detail. I gave it a
sequence <U+0E01 THAI CHARACTER KO KAI, U+25CC DOTTED CIRCLE, U+0E31
THAI CHARACTER MAI HAN-AKAT, U+0E01, U+0E31> to be rendered with the
Angsana New font, a font designed for Thai. Uniscribe categorised the
string as an unbroken run of Thai characters. Now, the font explicitly
defines how U+25CC and U+0E31 are to be combined in a Thai script run,
so I don't see how the font can be regarded as broken. Uniscribe
nevertheless insists that the sequence is faulty, and converts the
first U+0E31 into two glyphs, those for U+25CC and U+0E31. Uniscribe
is clearly just being restrictive in what characters a Thai combining
mark may be attached to in the backing storage.

Richard.
Received on Mon Oct 21 2013 - 19:46:32 CDT

This archive was generated by hypermail 2.2.0 : Mon Oct 21 2013 - 19:46:34 CDT