Re: math alphabets, WAS: Proprietary Card Decks

From: Asmus Freytag (
Date: Thu Apr 14 2011 - 10:53:34 CDT

  • Next message: Hans Aberg: "Re: math alphabets, WAS: Proprietary Card Decks"

    On 4/14/2011 7:13 AM, Doug Ewell wrote:
    > Hans Aberg<haberg dash 1 at telia dot com> wrote:
    >>>> Unicode does not have characters for say superscripts and
    >>>> subscripts, which are essential to math. My guess it would be too
    >>>> complicated to require it for current text-only renderers, but in
    >>>> the future that might change.
    >>> No, because in math, superscript is not a character attribute but a
    >>> formatting style that is applied to any term or formula and that can
    >>> be fully (and infinitely) nested.
    >>> That abstraction is better handled in markup than in plain text.
    >>> (Unlike the mathalphanumerics, such markup is still independent of
    >>> the font).
    >> That is so in rendering programs like TeX, because one does not enter
    >> the math so that it can be parsed semantically. One enters
    >> superscripts how they should be rendered and not by the logical
    >> structure of the formula.
    >> That is different if one has say a theorem prover. Then one can enter
    >> a formula, let the program parse it into an AST, and from that infer
    >> how it should be rendered, for example, where to put parenthesizes.
    > I don't follow this. Asmus' point is that superscript can be applied,
    > not only to any arbitrary character that can be used in a math
    > expression, but also at any arbitrary level of nesting. After Unicode
    > has finished adding superscript versions of every imaginable math
    > character, including all of the math alphanumerics, it would then have
    > to add second-level, third-level, etc. versions of every character, so
    > that one could enter "a to the b to the c to the (d times square root of
    > 2)" in plain text. And don't forget subscripts of superscripts, and
    > vice versa.
    > A counterargument that this is going too far, that Unicode wouldn't need
    > to encode arbitrary levels of superscript/subscript nesting but only
    > one, is basically an agreement with Asmus that this problem is best
    > solved by (semantic) markup.
    The distinction between "semantic" and "presentation" markup is an
    important one, here. This is a distinction that is figuring prominently
    in the design discussions for HTML5, for example, but it has not been
    dealt with very explicitly, up to now, in discussions of character encoding.

    In character encoding, all markup is implicitly supposed to be
    presentational, with the semantics represented in the plain text layer.

    If that simple model were appropriate in all circumstances, then any
    time you need any markup at all, you have "rich text" and if rich text
    is already required, why would anyone want to encode a distinction in
    plain text.

    However, this assumes that all markup is presentational. In the example
    for mathematical notation we see that Unicode encodes characters for
    those distinctions that would require presentational markup (appearance
    of symbols), while not encoding characters for distinctions that require
    semantic markup (scoping of expressions, nesting of expressions,
    including super/subscript). Another way to look at that would conclude
    that in mathematical notation the "atoms" include elements that would be
    styled (presentational) in regular text context.

    In phonetic notations (except some of the odd cases recently introduced)
    super and subscript are atomic in this sense and not presentational.
    However, where super and subscripts become expressions (with parens or
    slashes), then the question needs to be asked (and is being asked)
    whether these aspect of phonetic notations shouldn't best be represented
    with semantic markup.

    We are familiar with user interfaces that present "bold", "italic" etc.
    as attributes of characters, when typographically, these are really
    separate fonts (albeit conceived in concert with the regular font).
    Viewed in that way, the distinction between bold and italic forms and
    black letter, openface, sans-serif and monospace forms is simply a
    matter of degree and convention. All of these variations require font
    selection. Font selection is the ultimate in presentational markup.

    You could say that encoding the mathematical alphanumerics means that
    you can create mathematical text where one doesn't need font-selection
    to carry the semantics of a document, while you still need semantic
    markup. In particular, one doesn't need font-selection at the level of
    individual "atoms" of the notation.

    Super and supscript are a combination of relative positioning and size,
    and as Doug and I are pointing out here, this positioning applies in
    principle to the whole expression (whether it's a single variable or a
    fraction or a more complicated expression). Positioning in mathematical
    notation already requires markup for scoping, hence singling out
    super/subscript isn't adding anything useful.

    In conclusion, the lesson learned is that the simplistic character-glyph
    model, which recognizes only semantic plain text and presentation markup
    needs to be extended to include a hybrid model where atomic semantics
    are present in the plain text layer, scoped semantics are present in
    semantic markup and presentational markup isn't required to carry any of
    the semantic information.This model is characterized by the property
    that it does not require markup (such as entity markup) for the
    representation of atomic entities, and that presentational markup can be
    applied in ways that are clearly separate from the semantics (e.g.
    choice of a particular Fraktur/Black letter font to render generic
    "black letter" symbols).


    This archive was generated by hypermail 2.1.5 : Thu Apr 14 2011 - 10:56:53 CDT