From: Asmus Freytag (email@example.com)
Date: Thu Apr 14 2011 - 10:53:34 CDT
On 4/14/2011 7:13 AM, Doug Ewell wrote:
> Hans Aberg<haberg dash 1 at telia dot com> wrote:
>>>> Unicode does not have characters for say superscripts and
>>>> subscripts, which are essential to math. My guess it would be too
>>>> complicated to require it for current text-only renderers, but in
>>>> the future that might change.
>>> No, because in math, superscript is not a character attribute but a
>>> formatting style that is applied to any term or formula and that can
>>> be fully (and infinitely) nested.
>>> That abstraction is better handled in markup than in plain text.
>>> (Unlike the mathalphanumerics, such markup is still independent of
>>> the font).
>> That is so in rendering programs like TeX, because one does not enter
>> the math so that it can be parsed semantically. One enters
>> superscripts how they should be rendered and not by the logical
>> structure of the formula.
>> That is different if one has say a theorem prover. Then one can enter
>> a formula, let the program parse it into an AST, and from that infer
>> how it should be rendered, for example, where to put parenthesizes.
> I don't follow this. Asmus' point is that superscript can be applied,
> not only to any arbitrary character that can be used in a math
> expression, but also at any arbitrary level of nesting. After Unicode
> has finished adding superscript versions of every imaginable math
> character, including all of the math alphanumerics, it would then have
> to add second-level, third-level, etc. versions of every character, so
> that one could enter "a to the b to the c to the (d times square root of
> 2)" in plain text. And don't forget subscripts of superscripts, and
> vice versa.
> A counterargument that this is going too far, that Unicode wouldn't need
> to encode arbitrary levels of superscript/subscript nesting but only
> one, is basically an agreement with Asmus that this problem is best
> solved by (semantic) markup.
The distinction between "semantic" and "presentation" markup is an
important one, here. This is a distinction that is figuring prominently
in the design discussions for HTML5, for example, but it has not been
dealt with very explicitly, up to now, in discussions of character encoding.
In character encoding, all markup is implicitly supposed to be
presentational, with the semantics represented in the plain text layer.
If that simple model were appropriate in all circumstances, then any
time you need any markup at all, you have "rich text" and if rich text
is already required, why would anyone want to encode a distinction in
However, this assumes that all markup is presentational. In the example
for mathematical notation we see that Unicode encodes characters for
those distinctions that would require presentational markup (appearance
of symbols), while not encoding characters for distinctions that require
semantic markup (scoping of expressions, nesting of expressions,
including super/subscript). Another way to look at that would conclude
that in mathematical notation the "atoms" include elements that would be
styled (presentational) in regular text context.
In phonetic notations (except some of the odd cases recently introduced)
super and subscript are atomic in this sense and not presentational.
However, where super and subscripts become expressions (with parens or
slashes), then the question needs to be asked (and is being asked)
whether these aspect of phonetic notations shouldn't best be represented
with semantic markup.
We are familiar with user interfaces that present "bold", "italic" etc.
as attributes of characters, when typographically, these are really
separate fonts (albeit conceived in concert with the regular font).
Viewed in that way, the distinction between bold and italic forms and
black letter, openface, sans-serif and monospace forms is simply a
matter of degree and convention. All of these variations require font
selection. Font selection is the ultimate in presentational markup.
You could say that encoding the mathematical alphanumerics means that
you can create mathematical text where one doesn't need font-selection
to carry the semantics of a document, while you still need semantic
markup. In particular, one doesn't need font-selection at the level of
individual "atoms" of the notation.
Super and supscript are a combination of relative positioning and size,
and as Doug and I are pointing out here, this positioning applies in
principle to the whole expression (whether it's a single variable or a
fraction or a more complicated expression). Positioning in mathematical
notation already requires markup for scoping, hence singling out
super/subscript isn't adding anything useful.
In conclusion, the lesson learned is that the simplistic character-glyph
model, which recognizes only semantic plain text and presentation markup
needs to be extended to include a hybrid model where atomic semantics
are present in the plain text layer, scoped semantics are present in
semantic markup and presentational markup isn't required to carry any of
the semantic information.This model is characterized by the property
that it does not require markup (such as entity markup) for the
representation of atomic entities, and that presentational markup can be
applied in ways that are clearly separate from the semantics (e.g.
choice of a particular Fraktur/Black letter font to render generic
"black letter" symbols).
This archive was generated by hypermail 2.1.5 : Thu Apr 14 2011 - 10:56:53 CDT