Re: how to add all latin (and greek) subscripts

From: Asmus Freytag (
Date: Thu Jun 26 2008 - 14:53:43 CDT

  • Next message: =?ibm869?Q?Antonio MARTINS-Tuvalkin?=: "Re: Welsh orthography in Unicode"

    On 6/26/2008 11:15 AM, Kent Karlsson wrote:
    >> super- or subscript digits in cases like H2O, m², ¹4C, or 10²³
    >> is even remotely wrong and should use markup instead. That
    >> would be quite unnecessary overkill, when font coverage of
    >> these characters is quite sufficient (and survives markup
    >> stripping, though not compatibility mapping; hoping that
    >> my examples will survive e-mail).
    The formatting required for simple superscripts like exponents or
    chemical formulae is widely supported and does not require MathML.

    The upside of using plain-text superscript characters for simple
    situations in otherwise plain text is that they normally don't suffer
    from translation of format (e.g. HTML to plain text) and thus retain the

    However, that's theory except for superscript 1, 2 and 3 - because
    there's apparently still a lot of transcoding between Unicode and 8859-1
    going on, which would kill all others. Your example is perfect. When I
    first got your message, all super/subscripts were as you intended them,
    but your reply to yourself translated all except the above mentioned
    three into their plain digit equivalents.

    Also, I noticed that the superscript 4 when I saw it, came from a
    different font that uses slightly higher and smaller glyphs, making the
    14 look almost like a 1 to the fourth power. Also, the spacing of these
    is just not right when you use a monospaced font. In a true superscript,
    the effective font size would be reduced to match the size of the small
    glyphs, but in a plain text case, the characters must fit into the full
    sized display cell, meaning that using more than one of them will look
    odd (2 3 instead of 23).

    The plain text ones have their uses for quick and dirty footnote symbols
    and for indicating squared units in otherwise non-mathematical texts as
    well as similar *simple* usages. Such fallbacks are best limited to
    single digits of the 8859-1 subset to avoid the surprises you ran into.

    In addition, as you had noted earlier, the full repertoire of super and
    subscript characters are the proper choice for phonetic notations (e.g.
    digits used as tone marks). Such notations require preservation of
    specific semantics across formatting languages. They require much more
    extensive Unicode support as well as special fonts, and they wouldn't
    survive transcoding anyway, meaning the issues you encountered with your
    examples aren't as relevant in that field of application.


    PS: in the late 90's a request had been forwarded from people
    maintaining a chemical database to add a small number of additional
    Greek subscripts. The rationale was that they type of database was not
    able to handle any markup. The request never went anywhere, for lack of
    specific input from the submitters beyond an initial discussion, and it
    is unknown how they solved their problem. The database was intended for
    regulatory purposes, so one assumes that some solution was found, but
    there has been no information.

    This archive was generated by hypermail 2.1.5 : Thu Jun 26 2008 - 14:55:25 CDT