Re: Dotted Circle plus Combining Mark as Text

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Sun, 20 Oct 2013 15:30:47 +0100

On Sun, 20 Oct 2013 11:47:23 +0300
"Jukka K. Korpela" <jkorpela_at_cs.tut.fi> wrote:

> 2013-10-20 2:38, Richard Wordingham wrote:
 
> > Is a sequence of a U+25CC DOTTED CIRCLE plus a combining mark plain
> > text?

> The answer is that any string of
> characters may be considered as plain text and any string of
> characters may be treated as rich text according to some conventions.

The correct phrasing of the question is, I suppose, "Can the combination
of U+25CC DOTTED CIRCLE plus a combining mark be represented as plain
text?". Fortunately, everyone has understood the question.

> > If so, how many dotted circles should appear?

> Possibly none. An implementation need not support any particular
> collection of characters. But an implementation that supports U+25CC
> must treat it as a spacing character, and an implementation that
> supports e.g. U+0300 must treat it as a combining mark. So if the
> implemention is capable of visually rendering them, it shall render
> U+25CC U+0300 as a dotted circle with an acute accent above it. In
> this case, exactly one dotted circle should appear, then.

I fear the get-out clause is that an implementation doesn't support a
collection of characters, but rather a collection of strings. Many
renderers supporting Thai don't support Thai character sequence <DO, II,
II> in any useful fashion, instead allowing the second II to overstrike
the first, sometimes in such a way that the author does not realise he
has double struck the II character. This accidental non-standard
sequence is surprisingly common on the Internet. In the OpenType
world, this is probably a font issue.

> > If the sequence is not plain text, what mark-up notations are
> > available to control the number of dotted circles produced? I
> > am particularly interested in notation for HTML, e.g. via a style
> > sheet. Should the sequence instead be treated as a graphic?

> I don’t understand these questions. If the sequence is treated as
> other than plain text, then the results depend on the specific “rich
> text” or other conventiones applied.

If it can be argued that the extra dotted circle is valid, it would be
convenient for web authors to have a mechanism to suppress it, rather
as numeric format controls often allow control over the presence of an
optional plus sign. I was wondering if this issue had already been
addressed in mark-up language.

> What it means is a different issue. U+25CC is a symbol that can be
> used in a variety of meanings. I don’t think it means anything
> specific to most people, unless a definition is given. U+0E31 is a
> Thai vowel sign, and I don’t think any meaning in general has been
> assigned to it when applied to something else than a Thai letter.

In the context prompting the question, it is an explicit place holder
for a consonant. The usual symbol used by Thais (or, at least, their
textbook writers) is a dash, though the dash characters I tried had the
same problems with Uniscribe - dotted circles sprouted. At least the
hyphen-minus is available on Thai keyboard layouts. When naming the
vowels, o ang is used, but, alas, this is not suitable in the said
context.

Richard.
Received on Sun Oct 20 2013 - 09:33:04 CDT

This archive was generated by hypermail 2.2.0 : Sun Oct 20 2013 - 09:33:04 CDT