Re: Maths letters and digits (was: Is it true that Unicode is insufficient for Oriental languages?)

From: Asmus Freytag (
Date: Fri May 23 2003 - 00:57:29 EDT

  • Next message: Werner LEMBERG: "Re: Characters as requested plus some more"

    At 12:33 AM 5/23/03 +0200, Philippe Verdy wrote:
    >From: "Asmus Freytag" <>
    > > Styled text uses markup. However, for specialized texts, such as
    > > mathematics, where loss of style-markup can completely eradicate the
    > > meaning of the text, several symbol sets have been added to Unicode, where
    > > the symbols look like styled letters, but function very differently (i.e.
    > > as mathematical symbols).
    >This creates some conflicting interest: which semantic for characters
    >encoded in fonts: a style semantic if one wants to present Latin or
    >Cyrillic text with a Gothic style?

    There's no conflict: if it's text, you use style. If it's a symbol, you use
    the character.

    >Also there are exceptions in the new mathematic block, where some letters
    >were not encoded considering that they are already available in other
    >blocks, either as Letter-like symbols or as plain characters.

    The existing Letterlike symbols contained the high frequency (in terms of
    use) for these symbols. Not re-encoding the existing ones underscores that
    they map to a single set of mathematical symbols and are not alphabets for
    styled text.

    >What would happen to a mathematic text rendered with a Gothic style ? One
    >could not make a semantic distinction between plain characters symbols,
    >and Gothic style symbols.

    Please treat which addresses this issue.

    >The current encoding assumes that mathematic text use only fonts in a
    >basic style using only the "representative glyphs" shown in charts.
    >Depending on the fonts available to render a particilar text style
    >(independantly of its abstract charactersemantic), such distinction will
    >be hard to make.

    Again, see the TR.

    >This just proves that mathematical symbols use also the plain standard
    >scripts, whose rendered style is then suddenly important. If accuracy in
    >semantics was needed, clearly we would need to define separate mathematic
    >characters for the basic style, but Unicode chose to unify them...

    In the context of mathematics, not all font choices are suitable. I think
    that's something the mathematicians understand already, and the explanation
    for the non-experts is in the TR.

    >Conclusion: mathemetical symbols is a separate script, but Unicode unifies
    >this set incoherently as it assumes a default style for all scripts. So
    >can we say that general purpose fonts for extended Latin with Gothic style
    >are Unicode-compliant?

    Yes, but they may not be usable for mathematics. There are many fonts that
    are not usable for many purposes. Publishing an academic thesis with a
    'wedding invitation' style font (fancy script style) would not be apropos,
    in fact many institutions regulate the acceptable style for the text
    portion of such documents rather minutely.

    >Also it is not clear how serif and sans-serif variants of mathematical
    >symbols will behave with other non mathematic text, and where we can say
    >that the encoded text is mathematic and where it is not, so where a
    >required style MUST be applied.

    As usual, you will find that character encoding as such never solves *ALL*
    possible problems. Character encoding is concerned with being able to
    express the core semantic differentiation required to carry the content. A
    full document will usually require additional information (typically markup).

    >May be this should require defining new "BEGIN MATHS" and "END MATHS" (or
    >"BEGIN TEXT") abstract characters and encode them (as format control
    >characters) for the same semantic reasons Unicode defined and encoded the
    >"Invisible Function Application" or "Invisible Comma" or "Invisible
    >Multiplication Operator" (I'm not sure if they are their exact name, so
    >look in UCD if you need them).

    It's INVISIBLE SEPARATOR and INVISIBLE TIMES. Which you would have known if
    you had read the TR.



    This archive was generated by hypermail 2.1.5 : Fri May 23 2003 - 01:50:39 EDT