Superscript and Subscript Characters in General Use (was: Re: a character for an unknown character)

From: Marcel Schneider <>
Date: Wed, 4 Jan 2017 01:24:52 +0100 (CET)

On Tue, 3 Jan 2017 09:31:42 +0100, Christoph Päper wrote:

> > Among the possibilities, you include Unicode subscripts.
> Just for the sake of completeness.

This tends to conclude that preformatted subscripts are really an option here.
The TUS snippets [1][2] and common practice show that whatever characters are
on the keyboard, are used or re-used for superscripts, such as the degree sign
as superscript o, and the feminine ordinal indicator as superscript a. Layouts
are baffling inconsistent across countries; so the Belgian AZERTY layout has
superscript three where its French (France) counterpart has an empty shift state,
while SUPERSCRIPT ONE is missing on both, despite of the AltGr shift state being
partially used, and all three being a part of Latin-1. Thus, the consciousness
of the usefulness of a given character has not always a tight relation to its
presence on the keyboard.

In the Unicode era, this may tend to expand to the insight that the availability
of an almost complete range of superscripts, and a set of subscripts, including
Latin letters, calls the need to add them on national keyboard layouts to cater
for the demand of increasingly important user groups and communities. Supporting
this does eventually not require the Unicode Standard to be reworded, because
TUS mainly reflects encoding principles and usage recommendations, without being
a typography manual.

TUS 9.0, §22.4, p. 786, explains that the recommendation not to use preformatted
characters outside phonetics is a mere application of a design principle,
regardless of the practical usefulness of the scheme. I note that in the snippet
quoted below, the digit “‘DC0016’” is already messed up by copy-pasting it to
plain text. By contrast, copying it from Adobe Reader to Microsoft Word brings
the font size difference with it, but not the vertical alignment, presumably
because the original specifies a custom subscript style that has no generic
subscripting information and is not cross-platform compatible. This example
highlights a serious downside of the markup-based representation scheme.

As demonstrated with the apostrophe, a recommendation may be changed according
to common practice, and reconsidered in the light of differently weighed rules
and principles, in favor of what Asmus Freytag pointed on December 28ᵗʰ, 2016,
in reply to Richard Wordingham:

> > > > Ideal solutions can also be defeated by limited keyboard layouts. As a
> > > > result, I have no idea whether the singular of "fithp" (one of Larry
> > > > Niven's alien species) should be spelt with U+02BC or U+2019, though in
> > > > ASCII I can just write "fi'".
> > >
> > > The only place where "uni" doesn't apply in Unicode is that there's never
> > > just a single principle that applies, but always multiple ones that are
> > > in tension --- and in the edge cases, the tension can be felt keenly.
> > >

As seen in another example in a 2015 thread on plain text custom fractions,
the English Microsoft Community website is hosting recommendations on how to
insert fractions made of superscripts, subscripts and the fraction slash U+2044,
using a list of autocorrections in Word. To test, Iʼve added to the autocorrect
list four items converting '.s.' to 'ˢᵗ', '.n.' to 'ⁿᵈ', '.r.' to 'ʳᵈ', '.t.' to 'ᵗʰ'.
The result looks fine in Cambria, bad in uncomplete fonts mixed with a
fallback font, while Arial has the superscript 'n' in a non-standard way,
as a legacy remainder, despite of TUS specifying that all those characters
should be harmonized.

Itʼs up to the user to choose the best fitting option depending on usage
and environment. As already discussed, formatting is a working solution
at the condition that plain text will never be a requirement.

I hope that this lengthy contribution may help to straighten the way for
the users to feel free to use superscript and subscript characters the way
they prefer.


[1] TUS 9.0, §22.4, p. 786:
| In general, the Unicode Standard does not attempt to describe the positioning
| of a character above or below the baseline in typographical layout.
| Therefore, the preferred means to encode superscripted letters or digits,
| such as “1st” or “DC0016”, is by style or markup in rich text. […]
| In addition, superscript digits are used to indicate tone in transliteration
| of many languages. The use of superscript two and superscript three is common
| legacy practice when referring to units of area and volume in general texts.

[2] TUS 9.0, §7.8, p. 327:
| The superscript forms of the i and n letters can be found in the
| Superscripts and Subscripts block (U+2070..U+209F). The fact that the latter
| two letters contain the word “superscript” in their names instead of “modifier
| letter” is an historical artifact of original sources for the characters, and
| is not intended to convey a functional distinction in the use of these
| characters in the Unicode Standard.
| Superscript modifier letters are intended for cases where the letters carry
| a specific meaning, as in phonetic transcription systems, and are not
| a substitute for generic styling mechanisms for superscripting of text,
| as for footnotes, mathematical and chemical expressions, and the like.
Received on Tue Jan 03 2017 - 18:26:04 CST

