Mail could not be delivered

From: [email protected]
Date: Fri Jul 02 1999 - 21:09:00 EDT

Next message: Jonathan Coxhead: "RE: Superscript asterisk"
Previous message: Kenneth Whistler: "RE: Plain Text [**NOT**]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

****** Message from InterScan E-Mail VirusWall NT ******

The following mail could not be delivered.
Reason: Exceeded Maximum Delivery Attempts

***************** End of message ***************

Message-Id: <[email protected]>
Errors-To: [email protected]
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
From: Asmus Freytag <[email protected]>
To: Unicode List <[email protected]>
Date: Thu, 1 Jul 1999 16:44:22 -0700 (PDT)
Subject: Re: Superscript asterisk

At 02:28 PM 7/1/99 -0700, Jonathan Coxhead wrote:
> I must admit to a degree of astonishment here ... Isn't this exactly
>the sort of combinatoric explosion that the existing combining marks
>were intended to defeat?

Yes and no. As I wrote, we did consider combining marks as a means to avoid
combinatorics (but note that they are very different from the stateful
controls in your suggestion!) but felt that using that mechanism to affect
the style of a letter was inappropriate.

> | One of the drawbacks of your
> | proposal is that it is open ended in that regard, i.e. it is a
> | complete style markup (just add font size ;-).
>
> I saw this as an advantage, in fact. Font size is already present:
>PRESENTATION SUGGESTION SMALL/WIDE are there. The first is useful in
>text with lots of acronyms, for example.

And for that very reason your controls would clash with markup languages
wishing to use Unicode as the character encoding.

> So does this mean that there will be separate characters encoding
>'X' as italic, bold, bold italic, fraktur, script, double-struck? (Plus
>others I may not be aware of?)

Correct. The Latin characters are only A-Z and a-z, no accented characters,
IPA etc. The Greek characters are A-Omega and alpha-omega, plus 6 or seven
variant characters that tend to get used in Math. With the digits, this
comes to less than 1024 character. (Not all sets are used with all styles).

>And then also bold versions of COMBINING
>DOT ABOVE, COMBINING DIAERESIS, etc? (These are normally used with bold
>'X' to indicate velocity, acceleration, resp.)

No. Once MATH BOLD X exists as a character, the regular combining diaeresis
can be used (but you would want to make a glyph substitution for
publication quality printing, in a manner similar to the height adjustment
for putting an umlaut on 'a' vs. 'A'.)

> And then superscript and subscript versions of all these (and
>presumably supersuperscript, supersubscript, subsuperscript and
>subsubscript as well)? (Including combining marks?)

Again, no. Superscript is something that is a style issue in math. You can
write exp(1/x) or e \super { 1 / x } to use a TeX like way of showing the
intended layout. Both mean the same. For this and other reasons,
superscripting is best handled by some form of operator. Murray Sargent has
worked out a very useful scheme that does not require the full grouping
operators that you propose.

> And I'd guess there'd be a need for left- and right-half tilde and
>circumflex accents, because these can be used to bridge a character-
>pair together,

The combining half tides already exist. And they don't need a start group.

>and with no START GROUP character, the possibility of
>using
>
> start group
> latin small letter a, presentation suggestion italic,
> latin small letter b, presentation suggestion italic,
> pop directional formatting,
> combining tilde
>
>is gone.
>
> | It is generally not a good idea to introduce such elements as
> | characters because they blur the line between text content and text
> | markup and create contentions and inconsistencies between information
> | at one level and information at another.
>
> This is a line that is already fairly blurred.
>
> But my suggestion involves solving *no* new problems, because the
>relationship between '<i>x</i>' (H T M L markup) and 'latin small
>letter x, presentation suggestion italic' is exactly analagous to the
>relationship between
>
> <span dir = rtl>Rtl</span>
>
>("high-level markup" form) and
>
> right-to-left override,
> latin capital letter r,
> latin small letter t,
> latin small letter l,
> pop directional formatting
>
>("low-level character encoding" form), which you have to be able to
>deal with sensibly already.

I'm currently working out a reference implementation for bidi that will
implement these characters in a way that is *precisely* identical to the way
you would get with markup. One of the things I am learning is, that short
of duplicating the action of a markup parser, i.e. removing the controls
and replacing them by style runs, it is *almost* impossible to get the
right behavior.

If the bidi algorithm had been invented after the rise of HTML, it might
have well been the case that we would NOT have coded these characters.

> However, I shall endeavour to get to the conference in San
>Jose---it's all of 10 miles down the road :-)

It's usually a good spot to check out what's happening and to talk with
people about early stages of proposals. A year ago we had the first ad-hoc
meetings on the math issue with Don Knuth who stopped by.

> Just for amusement, here's another plain-text formula:
>
> / N
> | dx
> | --
> | x
> / x = 0
>
>(that's supposed to be an integral). With just those 13 characters, we
>can write
>
> integral,
> start group,
> latin small letter x, presentation suggestion italic,
> equals,
> digit zero,
> pop directional formatting,
> presentation suggestion subscript,
> latin capital letter n,
> presentation suggestion superscript,
> start group,
> latin small letter d,
> latin small letter x, presentation suggestion italic,
> pop directional formatting,
> fraction slash,
> latin small letter x, presentation suggestion italic.
>
> (You might see this as 'Sx=0Ndx/x', if your rendering agent is
>particularly dim. But that might be enough.)

Murray has worked this out without start groups.

>
> It seems to me that this is infinitely more elegant than a solution
>involving hundreds of new codings of Latin and Greek (and Hebrew)
>characters.

I won't argue elegance with you (or anyone here). My current understanding
of this issue has come to where I think that mathematicians' use of letters
and ordinary text use of letters are distinct. In math, it makes no sense
to have both a LATIN CAPTIAL A and a GREEK CAPITAL ALPHA. With most fonts,
readers can't tell them apart, and unlike text there is no word-context for
a variable. Computer Modern fonts all explicitly unified these letters in
all styles.

Now for text, this is a terrible way to do things, since here operations
such as case transformations and sorting are common and A and ALPHA
capitalize and sort differently. And usually they are not intermixed in the
same word.

Therefore it makes more sense than one would at first think to treat Math
Styled Alphabetics as a form of sybmol character, that just happened to
have the shape of a letter. (Letterlike Symbols is the name we have for the
original subset in Unicode). Oh, and I should mention that we will _not_
duplicate the existing letter like symbols, but leave holes in the new
sets, so that we don't introduce a nasty dual coding nightmare.

A./

Next message: Jonathan Coxhead: "RE: Superscript asterisk"
Previous message: Kenneth Whistler: "RE: Plain Text [**NOT**]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT