Re: MS Math layout (was: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts)

From: Philippe Verdy (
Date: Tue Jan 30 2007 - 18:04:15 CST

  • Next message: Philippe Verdy: "Re: Public Review Issue #75: UTR #25 draft updated"

    From: "Murray Sargent" <>
    > You can get some of the background of the Office 2007 math facility in
    > my blog ( The linear format input
    > language is defined in Unicode Technical Note #28
    > (
    > This technical note was produced using a beta version of Word
    > 2007 and illustrates the beauty of the math layout. Our approach
    > relies on Unicode's excellent support for mathematics and makes
    > extensive use of the math alphabetics.

    Some character choices in this document seem to be extremely arbitrary, and I don't think they are more readable or easier to understand than the correspondig "Autocorrect" alias names (with the \xxxxx form).
    Such strange assignments are those for phantoms and smashes (why do they use diamonds, or arrows which are completely counter-intuitive?), or the various grey/black boxes (that are not only difficult to input, but very bad to read in linear plain-text format, and not easy to make distinct when printed).

    Although the syntax has its merits -- simpler notation compared to (La)TeX or MathML -- the use of the "autocorrect" aliases will be most often prefered to the arbitrary Unicode characters that were chosen, especially when those characters cannot be represented in common input character sets (for which the traditional representation is to convert them using numeric character entities or, probably better, named entities. So I would have much prefered that "autocorrect" representations use a notation similar to named entities, i.e. "&name;" instead of "\name"; but "&" and ";" have been reserved in this notation for another usage, so this complicates such input; the "\name" representation is convenient for LaTex users, but probably not for others).

    Some missing notations: how to combine top/bottom operands with above-right/below-right operands? it seems that a notation should have been defined for noting the various positions (like those in the Unicode combining classes), and effectively a notation for specifying how line-breaks are permitted and handled (and so how rows are stacked, for example when noting:

      "some long sentence" / "some other long sentence"

    where both sentences should be allowed to wrap, and the two rectangular stacks of rows centered on the horizontal division line, or in:

      x + y + ... "(some long list of terms)" = a + b + c + ... + "(some other long list of terms)"

    where it should be possible to say that the long list of term should be wrapped in a set of rows that will not go to the left of the position of the equal sign (i.e. this equation contains three elements from left to right, and these elements are aligned so that "x" aligns vertically to the bottom, "=" aligns to the middle of the last row of the previous cell, and the first row for the terms is aligned too. If line wrap is needed, the three cells become candidate to linewraps, like if it was a table whose cell widthd are adjusted. For this case, the horizontal alignment of line wraps must be preserved, and this affects also the way the whole text is centered in each cell.

    For me, this technical note just shows what can be made to make math input more simple, but many things in this specification are not designed to make input more simple, and to handle such common cases.

    And the technical note does not address another issue: how do you compute a canonical linear form that allows comparing two formulas that should render identically? Another interesting thing is: can we recognize such an expression so that it can produce an unambiguous computation in programming languages, and how can we extend the syntax recognizer of those languages so that they allow inputing formulas and visualizing them with more mathematical-friendly rendering?

    Another final note: the set of keywords used for the "\autocorrect" notation should be localizable (i.e. there should exist a way to say that "\de" in a French document is the same as "\of" in an English document), as well as the set of special notational characters (possibly including the backslash itself), keeping a special meaning only for very few characters (like in XML where only "&" and "<" and ">" have special meaning in text elements, and "=" or "/" or quotes have special meaning in named element start tags). Specifying the language should be optional (for example when the language is already specified at the document level and inherited from it, or could be made with a prefix declaration like "\lang(fr)" which specifies a set of named entities (i.e. a namespace) preloaded from a dictionary.

    The language should be inherited by text elements without mathematical meaning embedded in the mathematical formulas (and why not allowing rich-text inclusions and CSS formating like color in formulas, or allowing to give identifiers to parts of a formula, which could be used by external scripting?) it's good to have a simple syntax, but making it extensible and compatible with richer notations (like XML) seems to be useful too (to allow this, some notation should be reserved for such optional implementation).

    This archive was generated by hypermail 2.1.5 : Tue Jan 30 2007 - 18:12:31 CST