From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Sat Mar 20 2010 - 02:25:05 CST
Benjamin Rossen wrote:
> A small number of characters are included in the UNICODE standard as
> superscripted and subscripted characters, most of them in the 2070 -
> 209F block.
Not that small... if you define these concepts as referring to characters
with <super> or <sub> in their compatibility decomposition, respectively,
there are currently 142 superscripted and 30 subscripted characters. (Oddly,
the great Unibook tool does not seem to have a function for getting such
statistics, though it lets one highlight those characters. So I did a simple
grep | wc -l on UnicodeData.txt.)
> The current policy is to regard these characters as
> duplications of their standard characters.
That’s a surprisingly common misconception, even among Unicode-aware people.
One reason to the misconception is that for some (“stylistic”) use of
superscripting and subscripting, as in writing the “st” in “1st”, you should
surely use formatting tools and instead of looking for a superscript or
subscript character—though you wouldn’t even find such characters. The point
is that Unicode does not contain such characters except for compatibility
with other character codes or because superscripting or subscripting has a
semantic role.
Check the description in the Unicode Standard, somewhat strangely hidden
under “Number Forms” in the “Symbols” chapter:
http://www.unicode.org/versions/Unicode5.2.0/ch15.pdf#page=13
This isn’t a simple issue, though. We cannot generally be expected to use
superscript characters for exponents in mathematical expressions, for
example. We can write 2³, but it would be absurd to require that all
characters you might wish to use in exponents be included in Unicode as
superscript characters.
I guess the point here is that mathematical expressions are, by their very
nature, not plain text. They have internal structure that needs to be
expressed somehow, visually using e.g. superscripting, or in XML markup, or
some other way. If such indications are simply omitted, the meaning is
changed or lost, in general: x to the power of y becomes xy. Therefore,
“flattening” or “linearizing” a mathematical expression into plain text
needs to introduce some extra notations or annotations, effectively creating
special markup or special operators, like x<pow>y</pow> or x**y or x^y, or
changing the mode of expression, e.g. pow(x,y). In a “flattening” process,
superscript characters may have some use.
See also “Unicode in XML and other Markup Languages”, Unicode Technical
Report #20, section 5.6 Superscripts and Subscripts:
http://unicode.org/reports/tr20/#Superscripts
However, there are often strong practical reasons to use formatting or
markup instead of superscript or subscript characters, even when the latter
would be more appropriate in principle. For example, in a phonetic notation
like [nʲet], where “ʲ” (U+02B2) indicates palatalization, it might be wise
to replace that character by normal “j” formatted or marked up as a
superscript, even though this implies that when treated as plain text, the
notation becomes [njet], which is semantically wrong (though might still be
understood sufficiently well).
> There are no plans to put more
> superscripted and subscripted characters into the UNICODE standard.
Can you cite a reference for that statement? I see no reason why such
characters would not be added, when existing usage can be demonstrated where
the character semantically differs from the corresponding normal character.
> There is at least one good reason for providing an enlarged set of
> unicode superscripted and subscripted characters. Formatting tools
> are not always available, and in some cases when they are available
> the tools require the deployment of APIs and editing tools that may
> be less than optimal.
I don’t think that constitutes a reason for including anything into the
Unicode Standard. Expanding the standard with new “characters” for such
reasons would have no end. It is also a very impractical approach. It surely
takes much more time to have a large set of characters added to Unicode and
at least some fonts and supported by relevant software than to just fix some
software that now has difficulties with, say, superscripting.
> The superscripted x is not available in UNICODE,
There is a superscripted x, namely U+02E3, MODIFIER SMALL LETTER SMALL X,
“ˣ”. I don’t know why it has been included and what it is used for, but I
would guess it is used in some phonetic notations, or maybe in the writing
system(s) of some small language(s). Technically it is defined as having the
compatibility decomposition “<super> 0078”, i.e. letter “x” in superscript
style, but this does not imply that it would be wise to use as superscript x
in general, e.g. when expressing exp(x) as e raised to the power x. One
practical reason to this is that you should expect font designers to draw
the character so it its design matches its intended scope of usage. Another
practical reason is that its font support is relatively limited, thereby
limiting your typographic design as a whole.
-- Yucca, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Sat Mar 20 2010 - 02:32:47 CST