Re: Dotted Circle plus Combining Mark as Text

From: Jukka K. Korpela <jkorpela_at_cs.tut.fi>
Date: Sun, 20 Oct 2013 11:47:23 +0300

2013-10-20 2:38, Richard Wordingham wrote:

> Is a sequence of a U+25CC DOTTED CIRCLE plus a combining mark plain
> text?

Well, is <h1>hello<h1> plain text? The answer is that any string of
characters may be considered as plain text and any string of characters
may be treated as rich text according to some conventions.

> If so, how many dotted circles should appear?

Possibly none. An implementation need not support any particular
collection of characters. But an implementation that supports U+25CC
must treat it as a spacing character, and an implementation that
supports e.g. U+0300 must treat it as a combining mark. So if the
implemention is capable of visually rendering them, it shall render
U+25CC U+0300 as a dotted circle with an acute accent above it. In this
case, exactly one dotted circle should appear, then.

Implementations often have bugs in dealing with combinining mark. This
may depend on the rendering software, or on the font.

> If the sequence is not plain text, what mark-up notations are
> available to control the number of dotted circles produced? I
> am particularly interested in notation for HTML, e.g. via a style
> sheet. Should the sequence instead be treated as a graphic?

I don’t understand these questions. If the sequence is treated as other
than plain text, then the results depend on the specific “rich text” or
other conventiones applied.

> This question is prompted by a confused discussion of what the notation
> <U+25CC, U+0E31 THAI CHARACTER MAI HAN-AKAT, U+25CC> on a web page
> meant.

What it means is a different issue. U+25CC is a symbol that can be used
in a variety of meanings. I don’t think it means anything specific to
most people, unless a definition is given. U+0E31 is a Thai vowel sign,
and I don’t think any meaning in general has been assigned to it when
applied to something else than a Thai letter.

The rendering of the sequence is a different matter. Not surprisingly,
tests on IE 10 show varying results. Using my test page
http://www.cs.tut.fi/~jkorpela/listfonts1.html
that renders, on IE, a given string in all the fonts available in the
system, I noticed that on my system, only SunExt-A and Unifont result in
correct rendering. Using Arial Unicode MS, the rendering is correct
except for the circles being dashed, and I think this is incorrect for
U+25CC, as it violates the identity of the character as a dotted circle.
A few other fonts contain the characters too, but the renderings have
three similar dotted rings, with the Thai diacritic above the middle one
or (in FreeSerif and Quivira) between the 2nd and 3rd. – On Chrome,
Safari, and Firefox, the results are similar, except that Chrome shows
the string as broken even when Arial Unicode MS is declared.

> The confusion was caused because some of us saw two dashed
> circles and others saw three dashed circles (one for each character)
> when viewing the web page.

The implementations that show three dotted circles are non-conforming.
Showing three dashed circles would be even more non-conforming.

If the purpose is to display the combining diacritic the same way as in
the code charts in the standard, i.e. with a dotted symbol appearing as
generically showing the place of a base character, then I’m afraid the
approach does not work in general. It should work, in the sense that
conforming implementations would render it the desired way if they
support the characters in rendering, but web browsers just don’t conform.

What you could do in a web page is to put U+00A0 U+25CC in one element
and U+0E31 in another and position the elements in the same place, set
to have the same width and to be horizontally centered. But I’m afraid
this would be off-topic here and could involve some nasty details.

Yucca
Received on Sun Oct 20 2013 - 03:50:57 CDT

This archive was generated by hypermail 2.2.0 : Sun Oct 20 2013 - 03:51:04 CDT