Re: Rendering Raised FULL STOP between Digits

From: Asmus Freytag <asmusf_at_ix.netcom.com>
Date: Sat, 09 Mar 2013 14:41:11 -0800

On 3/9/2013 1:51 PM, Jukka K. Korpela wrote:
> 2013-03-09 21:30, Asmus Freytag wrote:
>
>> I believe the Unicode Standard should be fixed by explicitly removing
>> all suggestions in the text that the raised decimal point is unified
>> with 002E.
>
> That would be a good move if agreement can be found on the recommended
> coding of the middle dot.
>
>> Second, the standard should be amended by identifying which character is
>> to be used instead for this purpose.
>>
>> It might be something like 00B7.
>
> There are several reasons why that would be a bad move. First, 00B7 is
> a seriously overloaded character already.
As is 002E. Overloading characters is not ipso facto a bad thing.

The standard precedent in Unicode recognizes the need to primarily
support rendering differences that cannot be determined absent markup.
In very limited situations are characters of identical rendering
behavior repeated on the basis of properties alone. The most common case
of this exception is the dual coding of non-breaking characters (space,
dash, etc.). A special exception for bidi properties exists for Arabic
digits.

However, many characters, like dashes and dots, have multiple uses to
the human writer and reader, and despite some differences in processing
(line breaking etc.) the general approach is to overload the character
and let humans (and software) disambiguate it on context - which at
least humans can do as long as it renders properly. (The latter is the
reason, in my view, why Unicode tends to disunify primarily for rendering).
> Second, it’s a middle dot, which may differ from a raised dot.
> Mixed-language documents may well contain both British number
> notations and occurrences of middle dot in various contexts, and it
> should be possible to make them appear as different.

I would agree with that concern if you could demonstrate, with the usual
evidence, that there is a distinction. Note that 8859-1 contains 00B7 at
B7 and this will have been used by anyone needing a raised dot and not
having a font that "magically" suppies one on context. (As James and
Richard have pointed out, that kind of font technology does not exist,
and there seems to be no interest by vendors to supply it - hence
underscoring the need for a different character).
>
> Due to another unfortunate unification (or semi-unification), 0387
> (Greek ano teleia) has been defined as canonical equivalent to 00B7,
> with the note “00B7 is the preferred character”. This means that glyph
> design for 00B7 needs to take this into account, and since Greek ano
> teleia isn’t really a middle dot (rather, an upper dot, appearing
> roughly around the x-height of a font, rather than at half of
> x-height, which is a natural position for middle dot).

This appears to be another possible mistake. However, the Greek script
does provide a context which could be used to select the "ano teleia"
appearance and properties (unless you tell me that the character appears
in Greek surrounded by non-Greek alphabet characters).
>
> The code chart comment on 002E (full stop) says: “may be rendered as a
> raised decimal point in old style numbers”. But checking a few fonts
> that use the OpenType feature for old style numbers (onum), I was
> unable to find any that has such a glyph selectable that way.

Yes, this comment makes no sense. It was a pious wish by the character
encoders during the early day of Unicode. It's not been picked up by
anyone in 20 years, so far as we know, which means it should be
recognized as to what it is: an evolutionary dead branch which needs to
be trimmed.
>
> I wonder what character and techniques British publishers use to
> produce notations with a raised dot. Is it 002E, with typographic
> tools used to raise it, or is it 00B7?

I agree, data would help settle this. Richard?
>
>> I believe that is
>> entirely possible, and non-disruptive, insofar as numeric use of 00B7
>> does not exist for any purpose other than showing a raised decimal point
>
> I’m afraid there is mathematical use of 00B7. It is tempting to use it
> as a multiplication dot (as in 2 · 2, meaning the same as 2 × 2),
> especially if you are limited to using ISO Latin 1 repertoire or you
> find 00B7 essentially simpler to type than 22C5 (dot operator).
> Standards have been vague or ignorant of the issue (now ISO 80000-2
> explicitly defines the multiplication dot as 22C5, but I wonder how
> many people know about this).

For mathematical notation, the mathematical publishers are well
organized and have agreements on how to handle issues like that (hence
the ISO standard). The fact that some individual authors might have used
00B7 as a fallback (or out of ignorance) is not really relevant here.
For rendering it's not an issue, and for automatic parsing it's like any
other typo.
>
> Especially if the middle dot is used as multiplication symbol without
> spaces around it, confusion would be guaranteed.

Human readers don't "read" the code points.
>
>> If that alternative is deemed not acceptable, the only remaining choice
>> would be to add a new character. (I would recommend that only as the
>> last resort).
>
> I would recommend that as the right approach. It will not fix the
> problem anytime soon, but it’s a move in the right direction.

Adding more raised dots of identical appearance is not really a good
answer. There's tremendous cost involved anytime we duplicate something
that readers can't tell apart without looking under the hood.

A./
Received on Sat Mar 09 2013 - 16:42:32 CST

This archive was generated by hypermail 2.2.0 : Sat Mar 09 2013 - 16:42:32 CST