Re: Maths letters and digits (was: Is it true that Unicode is insufficient for Oriental languages?)

From: Asmus Freytag (
Date: Fri May 23 2003 - 00:57:29 EDT

    At 12:33 AM 5/23/03 +0200, Philippe Verdy wrote:
    >From: "Asmus Freytag" <>
    > > Styled text uses markup. However, for specialized texts, such as
    > > mathematics, where loss of style-markup can completely eradicate the
    > > meaning of the text, several symbol sets have been added to Unicode, where
    > > the symbols look like styled letters, but function very differently (i.e.
    > > as mathematical symbols).
    >This creates some conflicting interest: which semantic for characters
    >encoded in fonts: a style semantic if one wants to present Latin or
    >Cyrillic text with a Gothic style?

    There's no conflict: if it's text, you use style. If it's a symbol, you use
    the character.

    >Also there are exceptions in the new mathematic block, where some letters
    >were not encoded considering that they are already available in other
    >blocks, either as Letter-like symbols or as plain characters.

    The existing Letterlike symbols contained the high frequency (in terms of
    use) for these symbols. Not re-encoding the existing ones underscores that
    they map to a single set of mathematical symbols and are not alphabets for
    styled text.

    >What would happen to a mathematic text rendered with a Gothic style ? One
    >could not make a semantic distinction between plain characters symbols,
    >and Gothic style symbols.

    Please treat which addresses this issue.

    >The current encoding assumes that mathematic text use only fonts in a
    >basic style using only the "representative glyphs" shown in charts.
    >Depending on the fonts available to render a particilar text style
    >(independantly of its abstract charactersemantic), such distinction will
    >be hard to make.

    Again, see the TR.

    >This just proves that mathematical symbols use also the plain standard
    >scripts, whose rendered style is then suddenly important. If accuracy in
    >semantics was needed, clearly we would need to define separate mathematic
    >characters for the basic style, but Unicode chose to unify them...

    In the context of mathematics, not all font choices are suitable. I think
    that's something the mathematicians understand already, and the explanation
    for the non-experts is in the TR.

    >Conclusion: mathemetical symbols is a separate script, but Unicode unifies
    >this set incoherently as it assumes a default style for all scripts. So
    >can we say that general purpose fonts for extended Latin with Gothic style
    >are Unicode-compliant?

    Yes, but they may not be usable for mathematics. There are many fonts that
    are not usable for many purposes. Publishing an academic thesis with a
    'wedding invitation' style font (fancy script style) would not be apropos,
    in fact many institutions regulate the acceptable style for the text
    portion of such documents rather minutely.

    >Also it is not clear how serif and sans-serif variants of mathematical
    >symbols will behave with other non mathematic text, and where we can say
    >that the encoded text is mathematic and where it is not, so where a
    >required style MUST be applied.

    As usual, you will find that character encoding as such never solves *ALL*
    possible problems. Character encoding is concerned with being able to
    express the core semantic differentiation required to carry the content. A
    full document will usually require additional information (typically markup).

    >May be this should require defining new "BEGIN MATHS" and "END MATHS" (or
    >"BEGIN TEXT") abstract characters and encode them (as format control
    >characters) for the same semantic reasons Unicode defined and encoded the
    >"Invisible Function Application" or "Invisible Comma" or "Invisible
    >Multiplication Operator" (I'm not sure if they are their exact name, so
    >look in UCD if you need them).

    It's INVISIBLE SEPARATOR and INVISIBLE TIMES. Which you would have known if
    you had read the TR.



