Re: Dotted Circle plus Combining Mark as Text

From: Asmus Freytag <asmusf_at_ix.netcom.com>
Date: Sun, 20 Oct 2013 12:32:53 -0700

On 10/20/2013 1:47 AM, Jukka K. Korpela wrote:
> 2013-10-20 2:38, Richard Wordingham wrote:
>
>> Is a sequence of a U+25CC DOTTED CIRCLE plus a combining mark plain
>> text?
>
> Well, is <h1>hello<h1> plain text? The answer is that any string of
> characters may be considered as plain text and any string of
> characters may be treated as rich text according to some conventions.
>

Unless some such conventions have been established, a string of
character codes is plain text.

A random implementation choice is a bug, not a convention.

Just because Unicode does not provide a method to announce or register a
convention doesn't mean all behavior should reverently be treated as a
convention.

If so, how many dotted circles should appear?
>
> Possibly none. An implementation need not support any particular
> collection of characters. But an implementation that supports U+25CC
> must treat it as a spacing character, and an implementation that
> supports e.g. U+0300 must treat it as a combining mark. So if the
> implemention is capable of visually rendering them, it shall render
> U+25CC U+0300 as a dotted circle with an acute accent above it. In
> this case, exactly one dotted circle should appear, then.
>
> Implementations often have bugs in dealing with combinining mark. This
> may depend on the rendering software, or on the font.

And bugs are bugs and not conventions.
>
>> If the sequence is not plain text, what mark-up notations are
>> available to control the number of dotted circles produced? I
>> am particularly interested in notation for HTML, e.g. via a style
>> sheet. Should the sequence instead be treated as a graphic?
>
> I don’t understand these questions. If the sequence is treated as
> other than plain text, then the results depend on the specific “rich
> text” or other conventiones applied.

A typical convention is the "show special characters" in many editors.
If such a feature included making visible combining marks by forcing
them to appear as isolated marks (not part of a sequence) and over a
dotted circle.

There are some conventions that show an extra dotted circle for certain
ill-formed sequences involving combining marks. Script-specific
combining marks may indeed have contexts in which they make no possible
sense. General purpose combining marks are not so restricted and to show
dotted circles with them is a bug.

Incidentally, the dotted circle shown in the Unicode Code charts is
*not* 25CC, and if I were to implement a "show dotted circle" feature in
a program I would not use 25CC for this - that character has a standard
glyph of rather unsuitable metrics for the purpose, never mind that many
people have co-opted it.
>
>> This question is prompted by a confused discussion of what the notation
>> <U+25CC, U+0E31 THAI CHARACTER MAI HAN-AKAT, U+25CC> on a web page
>> meant.
>
> What it means is a different issue. U+25CC is a symbol that can be
> used in a variety of meanings. I don’t think it means anything
> specific to most people, unless a definition is given. U+0E31 is a
> Thai vowel sign, and I don’t think any meaning in general has been
> assigned to it when applied to something else than a Thai letter.
>
> The rendering of the sequence is a different matter. Not surprisingly,
> tests on IE 10 show varying results. Using my test page
> http://www.cs.tut.fi/~jkorpela/listfonts1.html
> that renders, on IE, a given string in all the fonts available in the
> system, I noticed that on my system, only SunExt-A and Unifont result
> in correct rendering. Using Arial Unicode MS, the rendering is correct
> except for the circles being dashed, and I think this is incorrect for
> U+25CC, as it violates the identity of the character as a dotted circle.

Not really - if you go back to the originals, e.g. early versions of
Unicode you see dashed circles. Unicode 2.0 clearly shows a dashed
circle and for that edition, I believe, we are talking about the first
use of outline fonts for code charts around that time.

> A few other fonts contain the characters too, but the renderings have
> three similar dotted rings, with the Thai diacritic above the middle
> one or (in FreeSerif and Quivira) between the 2nd and 3rd. – On
> Chrome, Safari, and Firefox, the results are similar, except that
> Chrome shows the string as broken even when Arial Unicode MS is declared.
>
>> The confusion was caused because some of us saw two dashed
>> circles and others saw three dashed circles (one for each character)
>> when viewing the web page.
>
> The implementations that show three dotted circles are non-conforming.
> Showing three dashed circles would be even more non-conforming.
>
> If the purpose is to display the combining diacritic the same way as
> in the code charts in the standard, i.e. with a dotted symbol
> appearing as generically showing the place of a base character, then
> I’m afraid the approach does not work in general. It should work, in
> the sense that conforming implementations would render it the desired
> way if they support the characters in rendering, but web browsers just
> don’t conform.

Except that there is no character in the standard that matches (by
identity) the dotted glyph used in the code charts.

A./
>
> What you could do in a web page is to put U+00A0 U+25CC in one element
> and U+0E31 in another and position the elements in the same place, set
> to have the same width and to be horizontally centered. But I’m afraid
> this would be off-topic here and could involve some nasty details.
>
> Yucca
>
>
>
>
>
>
>
Received on Sun Oct 20 2013 - 14:35:15 CDT

This archive was generated by hypermail 2.2.0 : Sun Oct 20 2013 - 14:35:16 CDT