Re: Rendering Raised FULL STOP between Digits from Richard Wordingham on 2013-03-10 (Unicode Mail List Archive)

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Sun, 10 Mar 2013 21:36:16 +0000

On Sun, 10 Mar 2013 17:22:05 +0200
"Jukka K. Korpela" <jkorpela_at_cs.tut.fi> wrote:

> 2013-03-10 4:57, Asmus Freytag wrote:
>
> >> 'The Lancet' reportedly insists on the use of the raised decimal
> >> point
> […
> > That's sensible advice, in a way, because B7 is in 8859-1 and
> > therefore supported in a huge variety of fonts, for practical
> > purposes, the coverage among non-decorative text fonts is pretty
> > near universal.
>
> This probably implies that most people who wish or need to use a
> raised dot will keep using B7. And that’s fine for most purposes.
>
> A new character would *allow* people to use raised dot, for which
> fonts could contain suitable renderings that are independent of the
> demands of B7 (especially due to its intended primary use as middle
> dot in some languages). This would mostly be relevant to accurate
> coding of old documents rather than to everyday needs of British
> writers.

The Greek punctuation mark U+00B7 (an upper dot) is also under some
stress. If it aligns with the top of the preceding character, as
apparently it should for Greek letters, that causes some strain for
rendering "MS-DOS<ano teleia>" or "Windows 7<ano teleia>" in Greek
text.

The existence of unambiguous leading and trailing decimal points argue
for the decimal point having a bidi class EN! Does anyone use ano
teleia for right-to-left text? Perhaps one will just have to protect
leading and trailing decimal points with directionality controls, in
which case a bidi class of ES will suffice for decimal points flanked by
digits.

The line-breaking class of U+00B7 is currently AL (alphabetic); a
decimal point needs NU (numeric), which is slightly more restrictive.
Making NU the line breaking class of U+00B7 would not hurt.

The value of Word_Break for the decimal point should be Numeric, like
its forbear U+066B ARABIC DECIMAL SEPARATOR. U+00B7 has the Word_Break
value MidLetter. A value of MidNumLet would work for U+00B7, and would
handle decimal points between digits. This separates leading and
trailing decimal points from the rest of the number, but is no worse
than the current situation with FULL STOP. Leading U+00B7 could be
dealt with by a special rule. For trailing decimal points, arguable a
defective notation, the only completely robust solution is to rely on
the lack of a word break being marked manually. I believe we ought to
add general rules of the form

Any × U+2060
U+2060 × Any

> According to “A history of mathematical notations” by Florian Cajori,
> paragraph 286, the vertical position of the dot used as decimal
> separator varied a lot in the 19th century. It varied from just a
> little above the baseline up to the x-height and above, even to the
> top of lining figures! I would expect that 20th century typography
> had similar variation.

I haven't seen any such variation; the late 20th century seems to have
stabilised the form. Some of the variation may be due to changes in the
placement of the digits.

> Of course, Unicode cannot encode all the possible vertical positions
> (and sizes) of a raised dot. Such things would be normal glyph
> variation, for stylistic or other reasons. The point is that no such
> variation is realistic for B7.

If we unify U+00B7's three possible roles of (a) digraph breaker, (b)
ano teleia and (c) decimal point, we could have the following scheme:

(1) Before digit, use decimal point glyph;
(2) Else before letter, use digraph breaker glyph;
(3) Else after letter, use ano teleia glyph;
(4) Else after digit + WJ, use decimal point glyph
(5) Else, use ano teleia glyph

Unfortunately, using WJ is not legitimate. Possibly we could
distinguish digit + decimal point by encoding ZWNJ between them, on
the basis that in this case the glyph does not depend on the choice
of preceding digit.

Note that the cause of the complications is the use of U+00B7 to
represent ano teleia, not its rôle in sequences such as 's·h'.

How was it ever expected that a renderer could choose between
forms like "3. " and "3· " for <U+0033, U+002E, U+0020>?

Richard.

>
> Yucca
>
>
>
>
Received on Sun Mar 10 2013 - 16:41:04 CDT

This archive was generated by hypermail 2.2.0 : Sun Mar 10 2013 - 16:41:05 CDT