From: Ondrej Certik (email@example.com)
Date: Fri Jun 27 2008 - 02:30:58 CDT
first thanks Xun, Phillips, Johannes, Kent and Asmus for your
feedback. My comments are below.
On Thu, Jun 26, 2008 at 9:53 PM, Asmus Freytag <firstname.lastname@example.org> wrote:
> On 6/26/2008 11:15 AM, Kent Karlsson wrote:
>>> super- or subscript digits in cases like H2O, m², ¹4C, or 10²³
>>> is even remotely wrong and should use markup instead. That
>>> would be quite unnecessary overkill, when font coverage of
>>> these characters is quite sufficient (and survives markup
>>> stripping, though not compatibility mapping; hoping that
>>> my examples will survive e-mail).
> The formatting required for simple superscripts like exponents or chemical
> formulae is widely supported and does not require MathML.
> The upside of using plain-text superscript characters for simple situations
> in otherwise plain text is that they normally don't suffer from translation
> of format (e.g. HTML to plain text) and thus retain the semantic.
> However, that's theory except for superscript 1, 2 and 3 - because there's
> apparently still a lot of transcoding between Unicode and 8859-1 going on,
> which would kill all others. Your example is perfect. When I first got your
> message, all super/subscripts were as you intended them, but your reply to
> yourself translated all except the above mentioned three into their plain
> digit equivalents.
Reply to Kent's original email works for me. Anyway, this is not
important for the present discussion, as clearly your email client
should work with UTF8 and then all would be fine, otherwise it's a bug
in the client.
> Also, I noticed that the superscript 4 when I saw it, came from a different
> font that uses slightly higher and smaller glyphs, making the 14 look almost
> like a 1 to the fourth power. Also, the spacing of these is just not right
> when you use a monospaced font. In a true superscript, the effective font
> size would be reduced to match the size of the small glyphs, but in a plain
> text case, the characters must fit into the full sized display cell, meaning
> that using more than one of them will look odd (2 3 instead of 23).
For me it looks good in the browser (in Debian). Then I pasted it into
the terminal and all
looks perfectly fine and nice (yes, with the little disadvantage of
the fixed width fonts).
> The plain text ones have their uses for quick and dirty footnote symbols and
> for indicating squared units in otherwise non-mathematical texts as well as
> similar *simple* usages. Such fallbacks are best limited to single digits of
> the 8859-1 subset to avoid the surprises you ran into.
> In addition, as you had noted earlier, the full repertoire of super and
> subscript characters are the proper choice for phonetic notations (e.g.
> digits used as tone marks). Such notations require preservation of specific
> semantics across formatting languages. They require much more extensive
> Unicode support as well as special fonts, and they wouldn't survive
> transcoding anyway, meaning the issues you encountered with your examples
> aren't as relevant in that field of application.
> PS: in the late 90's a request had been forwarded from people maintaining a
> chemical database to add a small number of additional Greek subscripts. The
> rationale was that they type of database was not able to handle any markup.
> The request never went anywhere, for lack of specific input from the
> submitters beyond an initial discussion, and it is unknown how they solved
> their problem. The database was intended for regulatory purposes, so one
> assumes that some solution was found, but there has been no information.
For general mathematical formulas, one needs to use TeX or a similar
system (mathml for example, but the current rendering engines for
mathml, like in browsers, do not look as good as TeX). Of course we
support this in sympy, but what I am asking for is to improve the
experience in the terminal, because you cannot use tex or mathml in
the terminal (those require a full graphical fronted, like a browser,
or a windows application)
To give you the idea what I mean, look at these examples:
(especially the screenshots of the terminals at the end). See also
this thread for the background why we want that:
The observation is, that one can take advantage of unicode and print a
surprisingly lot of formulas in a plain text (terminal) mode. E.g.:
but as I said, some characters are missing. As I understand, unicode
still has a lot of free space to add more characters, right? Is there
some technical problem with it? If not, let's discuss the
philosophical issues: you can do all superscripts, except "q". I
understand those could be from historical reasons, but anyway, let's
just add "q" somewhere and be done with it. Then let's add all missing
latin letters to subscripts, there are already 8 of them, so let's add
the rest too. And then the same for greek super and subscripts.
Some of you objected (if I understand) that one should not use sub or
superscripts, because those are meant only for backward compatibility,
one should use a markup. Well, as Kent has remarked, it is useful in
many cases. That's why all the numbers were added. Well, the latin
(and greek) letters would be *very* useful to math, because you can
represent tensors easily with it. If there were not latin/greek
sub/superscripts in unicode, I would understand that. But in the
present case, where clearly the support is already there, only half
finished, it seems to me that the best way to go forward is to finish
the support for all latin/greek sub/superscripts.
What do you think? If you are not against and agree with me that it
should be done, I'd like to do the work --- I'd appreciate any
pointers about what should I do.
If you don't agree, I'd like to discuss it. :)
This archive was generated by hypermail 2.1.5 : Fri Jun 27 2008 - 10:34:24 CDT