Re: Why incomplete subscript/superscript alphabet ?

From: Philippe Verdy <>
Date: Fri, 30 Sep 2016 18:36:22 +0200

2016-09-30 17:54 GMT+02:00 Jukka K. Korpela <>:

> Using HTML, for example, the way to achieve that at present would be to
> use markup like <span class="sub">...</span> (to avoid the problems caused
> by the default formatting of <sub> and <sup>) and to use a CSS style sheet
> that sets font-family suitably and uses OpenType font feature settings to
> select subscript or superscript glyphs. In practice, you would need to use
> @font-face to embed a suitable OpenType font. So it’s doable, but not
> trivial like just slapping <sub> and </sub> around some text.

Not needed. the <sup> and <sup> elements in HTML can be styled directly as
well (also with CSS), with clear implied semantic, without needing the
creation of a custom class in a non-semantic <span> element.
Here the intent in the formula was clearly to designate a subscript
notation (as opposed to a superscript whose meaning in formulas after the
symbol of a variable is generally an exponent. Using superscripts after
other symbols (such as a summation operation) generally designate something
else (an upper bound). After some operators such as "C" it means a cardinal
in a set from which all possible unordered combinations (distinct subsets)
are counted. In cimicla formulas, superscripts and subscripts are used
before or after an element to indicate some physical state (total charge,
charge of the nucleus, total weight, 3D configuration for compound elements
and cristalline forms, orientation, number of occurences for subgroups in
complex compounds...).

In formulas the supercripts and subscripts, are parsed according to the
context after which they occur (which will remap these superscript or
superscripts by assigning them a speficic role), but alone they are just
sub/super-scripts with no other semantics added (but still keeping all the
semantics of their content).

For complex compounds, these subscript/superscripts are not enough and
specific layouts and symbols are needed, but you cannot use simple linear
plain-text to represent them without defining a specific notation
convention and defining annotation terms inserted in the custom formula.
Plain-text encoding will not solve the problem of representation at a
character level: you'll need an upper protocol. There's an infinite way to
define these protocols but they are out of scope of Unicode, which will not
encode them (the same way that it does not encode orthographic conventions
or script conventions for specific languages: the conventions for technical
notations are creating their own language).
Received on Fri Sep 30 2016 - 11:37:00 CDT

This archive was generated by hypermail 2.2.0 : Fri Sep 30 2016 - 11:37:00 CDT