Re: Superscript asterisk---reference (correction)

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Jul 02 1999 - 16:31:25 EDT


At 10:15 AM 7/2/99 -0700, Jonathan Coxhead wrote:
> The reference for the digamma function is really: Jeffreys and
>Jeffreys, Methods of Mathematical Physics (3rd ed), Cambridge
>University Press, 1978, p465. If encoded with the new mathematical
>symbols, the formula would appear as

Of course, I should have spotted the digamma myself. But I think it won't
be needed in the math styled alphabets as long as we don't have evidence of
a bold digamma being used distinctively.

>The first would appear in a renderer that implemented only up to
>Unicode 3.0 as
>
> ?
> <digamma>(?) = -- ??? ?!
> ??
>
>the second as
>
> <digamma>(z) = d/dz log z!

Coding the Math Styled Alphabetics makes (more) sense in the context of the
full analysis which showed us that we are missing about 30-60% of all the
non-alphabetic *symbols* needed for full coverage of mathematics. Once you
try to rely on Unicode for serious math publications, you will need to
require a version > 3.0 anyway. (Although the existing set is quite large,
and obviously weighted towards some of the more common notations).

>
> | As I wrote, we did consider combining marks as a means to
> | avoid combinatorics (but note that they are very different from the
> | stateful controls in your suggestion!) but felt that using that
> | mechanism to affect the style of a letter was inappropriate.
>
> I think there may be a small misunderstanding here: the PRESENTATION
>SUGGESTIONS are no more stateful than any other existing combining
>mark; START GROUP is no more stateful than, e g, LEFT-TO-RIGHT
>OVERRIDE.

No this was not a misunderstanding on my part. I just spent the better part
of this week on getting LRO to work right in my sample implementation of
the bidi algorithm - a task made harder by the recent reformulation that
requires the algorithm to work the same "as if" it had been implemented in
a markup layer.

I would conclude that one really must want these characters very badly
before going to the trouble of supporting them in plain text (and that
chances are that most implementations will be buggy).

The saving grace of the bidi controls is that they can be ignored for all
other processes. This would not be true for presentation suggestions if
they were to carry semantic distinctions. (For example, sorting and
searching would be horribly complicated).

> | I'm currently working out a reference implementation for bidi that
> | will implement these characters in a way that is *precisely*
> | identical to the way you would get with markup. One of the things I
> | am learning is, that short of duplicating the action of a markup
> | parser, i.e. removing the controls and replacing them by style runs,
> | it is *almost* impossible to get the right behavior.
>
> Yes, I've had a go at this too. It was the experience gained while
>doing it that let to my current ideas: I wanted to be able to render
>as much as possible with a minimal glyph set but a sophisticated
>renderer. The idea of the PRESENTATION SUGGESTIONS as productive
>combining marks initially arose purely out of a desire to do something
>more regular with those annoying little compatibility flags written in
>the Unicode Standard as <black-letter>, <double-struck>, etc.
>
> The fact that you could do a surprisingly good job of maths and the
>Finno-Ugric Phonetic Alphabet (which I hadn't heard of then) with just
>13 new characters came as somewhat of a surprise, which is why I wanted
>to share it.
>
> | If the bidi algorithm had been invented after the rise of HTML, it
> | might have well been the case that we would NOT have coded these
> | characters.
>
> I would dispute the necessity of history here: from my outsider's
>perspective (and I follow Unicode, H T M L, X M L, Math M L, etc,
>fairly closely), it seemed as though the "dir = rtl" and "dir = ltr"
>attributes introduced into H T M L 4.0 were just a higher-level clone
>of the features already in Unicode. In other words, if the characters
>hadn't been in Unicode, I doubt they would have got into H T M L.

Similar feature sets were in existing word porcessors before Unicode; while
it's a fact that HTML4.0 owes a bit to the work done by Unicode and those
individuals active on both efforts, the way the various markup languages
have responded to the needs of implementers would have lead to an
equivalent form of bidi support sooner or later. My point was, if at the
time, HTML had been as widely supported, we might have had an easier time
to draw a more restrictive line for plain text. However, the line is by
definition fuzzy and there is some level at which reasonable people can
disagreee -- that's why we need committees to ratify the consensus answer,
so we can all move on.

> | I won't argue elegance with you (or anyone here). My current
> | understanding of this issue has come to where I think that
> | mathematicians' use of letters and ordinary text use of letters are
> | distinct. In math, it makes no sense to have both a LATIN CAPTIAL A
> | and a GREEK CAPITAL ALPHA. With most fonts, readers can't tell them
> | apart, and unlike text there is no word-context for a variable.
> | Computer Modern fonts all explicitly unified these letters in all
> | styles.
>
> The trouble with this viewpoint is the fact that, once there are
>italic, bold, etc, variants of the Latin alphabet in Unicode, people
>are going to use them in written text for emphasis, or to be cute,
>or whatever. There is no reason not to, after all, and no way to stop
>them, and the results will look much nicer. Mathematicians may argue,
>"But those are *our* characters!", but no-one will listen: the new
>characters just will not be confined to only a mathematical usage. I
>guarantee it!

This is precisely the reason why I and other people are in favor of
restricting the math-styled alphabetics to the smallest set required by
math (and to code them in plane 1).
>
> I suppose my perspective is that a character encoding belongs to its
>user base, not the standards body that encodes it, in much the same way
>that a language belongs to its speaker community. The job of the
>standardiser is to provide tools for expression: attempts to define
>rules that limit the expression will fail anyway, so are pointless.

Yes and No. On the face of it, you pretty much describe a strong sentiment
in the Unicode Consortium. We usually don't try to place a-priori
restrictions on the use of characters. But there are several areas where we
give very formal rules. The interpretations of the UTF-8 transformation,
UTF-16 transformation, or the bidi algorithm are each a case in point,
where some interpretations and uses are 'ill-formed' or 'illegal'.

I've enjoyed having you poke at the proposal, and I'm sure that airing the
discussion in this manner is going to be a helpful background to the
discussion that will ensue when the proposal is taken up again in committee.

A./



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT