Re-use of Modifier Letters for Superscript Abbreviations (was: Re: a character for an unknown character) from Marcel Schneider on 2016-12-31 (Unicode Mail List Archive)

From: Marcel Schneider <charupdate_at_orange.fr>
Date: Sat, 31 Dec 2016 11:45:02 +0100 (CET)

On Sat, 31 Dec 2016 09:20:30 +0000, Richard Wordingham wrote:
[…]
> It's in a different universe, restricted to one book, namely Footfall.

Thank you for the reference.

[…]
> Did you look in the article about Klingon, namely
> https://en.wikipedia.org/wiki/Klingon_language , or
> in the article about Klingons, namely
> https://en.wikipedia.org/wiki/Klingon ? The quote is from the former.

Iʼve looked up the wrong one, didnʼt think of the language article.
Thanks for the link.

Iʼm now looking back at another quotation of yours, to spin off a new thread again
about the topic that I urgently need to gather more information about:

On Fri, 30 Dec 2016 22:17:12 +0000, Richard Wordingham wrote:
>
> On Fri, 30 Dec 2016 20:13:41 +0100 (CET) Marcel Schneider wrote:
> >
> > > U+2E31 WORD SEPARATOR MIDDLE DOT
> > > U+30FB KATAKANA MIDDLE DOT
> >
> > These seem to me identical to U+00B7 and U+2022 respectively. Perhaps
> > weʼre here faced with two examples of what Asmus referred to as
> > “incorrectly encoded more than once” (talking of “Many other "simple"
> > marks: lines, circles, triangles, hooks, and squares, or groups of
> > them”).
>
> I was talking about what "fuels the misperception that Unicode somehow
> encodes symbols based on a single conventional usage".

I persist believing that particular scripts like Avestan and Samaritan Aramaic
can require special characters like the WORD SEPARATOR MIDDLE DOT. Not fueling
a misperception of Unicode character encoding couldʼt drive the UTC to reject this
(for version 5.2). The KATAKANA MIDDLE DOT in turn is a part of the standard since
the beginning, like the BULLET. I imagine that a generic bullet may not be suitable
for Katakana.

To get an idea of how character encoding works, people wonʼt look at scripts they
donʼt know. Given that there is a misperception, one way to not fuel it could be
to encourage character re-use. Actually this is rather discouraged, as in the
example of Latin modifier letters that are (basically) preformatted superscripts.
TUS states that there is no functional difference between those that have the word
SUPERSCRIPT in their name, and those that donʼt:

TUS 9.0, §7.8, p. 327:
| The superscript forms of the i and n letters can be found in the
| Superscripts and Subscripts block (U+2070..U+209F). The fact that the latter
| two letters contain the word “superscript” in their names instead of “modifier
| letter” is an historical artifact of original sources for the characters, and
| is not intended to convey a functional distinction in the use of these
| characters in the Unicode Standard.
http://www.unicode.org/versions/Unicode9.0.0/ch07.pdf#G24762

Probably that is intended to discourage their use as superscripts.
Superscript digits too are confined to phonetics, and the use of superscript two
and three in measurement units is merely tolerated, not encouraged:

TUS 9.0, §22.4, p. 786:
| In addition, superscript digits are used to indicate tone in transliteration
| of many languages. The use of superscript two and superscript three is common
| legacy practice when referring to units of area and volume in general texts.
http://www.unicode.org/versions/Unicode9.0.0/ch22.pdf#G42931

Cnnsequently, the notation of the acceleration unit 'ms⁻²' doesnʼt seem to be
sustained by Unicode. Though this may be considered a technical notation, so
that there would be a reason to allow it.

These examples are intended to demonstrate the ambiguity of the recommendation
to use markup and rich text format whenever vertical alignment matters, except
in phonetics. I suspect that political correctness with respect to non-Latin
scripts could eventually have biased Unicode’s policy, whereas Western Arabic
digits and Latin letters are probably the only characters to be used extensively
in super- and subscript position.

As a result, the misperception of Unicode as a one-codepoint-per-usage standard
is even more fueled, and I can now better understand why our NB intended to have
French ordinal indicator(s) encoded in Unicode aside the already existing
superscript Latin small letter(s).

But admitting that encoding new French ordinal indicators is a really good idea,
Iʼm curious of the response of the UTC. However, given that the regular process
will take two years, would Unicode agree that in the meantime, the modifier
letters be put in their place on the on-coming keyboard layout?

Marcel
Received on Sat Dec 31 2016 - 04:45:50 CST

This archive was generated by hypermail 2.2.0 : Sat Dec 31 2016 - 04:45:50 CST