Re: super- and subscripted characters

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Sat Mar 20 2010 - 02:25:05 CST

  • Next message: Michael Everson: "Re: super- and subscripted characters"

    Benjamin Rossen wrote:

    > A small number of characters are included in the UNICODE standard as
    > superscripted and subscripted characters, most of them in the 2070 -
    > 209F block.

    Not that small... if you define these concepts as referring to characters
    with <super> or <sub> in their compatibility decomposition, respectively,
    there are currently 142 superscripted and 30 subscripted characters. (Oddly,
    the great Unibook tool does not seem to have a function for getting such
    statistics, though it lets one highlight those characters. So I did a simple
    grep | wc -l on UnicodeData.txt.)

    > The current policy is to regard these characters as
    > duplications of their standard characters.

    That’s a surprisingly common misconception, even among Unicode-aware people.
    One reason to the misconception is that for some (“stylistic”) use of
    superscripting and subscripting, as in writing the “st” in “1st”, you should
    surely use formatting tools and instead of looking for a superscript or
    subscript character—though you wouldn’t even find such characters. The point
    is that Unicode does not contain such characters except for compatibility
    with other character codes or because superscripting or subscripting has a
    semantic role.

    Check the description in the Unicode Standard, somewhat strangely hidden
    under “Number Forms” in the “Symbols” chapter:
    http://www.unicode.org/versions/Unicode5.2.0/ch15.pdf#page=13

    This isn’t a simple issue, though. We cannot generally be expected to use
    superscript characters for exponents in mathematical expressions, for
    example. We can write 2³, but it would be absurd to require that all
    characters you might wish to use in exponents be included in Unicode as
    superscript characters.

    I guess the point here is that mathematical expressions are, by their very
    nature, not plain text. They have internal structure that needs to be
    expressed somehow, visually using e.g. superscripting, or in XML markup, or
    some other way. If such indications are simply omitted, the meaning is
    changed or lost, in general: x to the power of y becomes xy. Therefore,
    “flattening” or “linearizing” a mathematical expression into plain text
    needs to introduce some extra notations or annotations, effectively creating
    special markup or special operators, like x<pow>y</pow> or x**y or x^y, or
    changing the mode of expression, e.g. pow(x,y). In a “flattening” process,
    superscript characters may have some use.

    See also “Unicode in XML and other Markup Languages”, Unicode Technical
    Report #20, section 5.6 Superscripts and Subscripts:
    http://unicode.org/reports/tr20/#Superscripts

    However, there are often strong practical reasons to use formatting or
    markup instead of superscript or subscript characters, even when the latter
    would be more appropriate in principle. For example, in a phonetic notation
    like [nʲet], where “ʲ” (U+02B2) indicates palatalization, it might be wise
    to replace that character by normal “j” formatted or marked up as a
    superscript, even though this implies that when treated as plain text, the
    notation becomes [njet], which is semantically wrong (though might still be
    understood sufficiently well).

    > There are no plans to put more
    > superscripted and subscripted characters into the UNICODE standard.

    Can you cite a reference for that statement? I see no reason why such
    characters would not be added, when existing usage can be demonstrated where
    the character semantically differs from the corresponding normal character.

    > There is at least one good reason for providing an enlarged set of
    > unicode superscripted and subscripted characters. Formatting tools
    > are not always available, and in some cases when they are available
    > the tools require the deployment of APIs and editing tools that may
    > be less than optimal.

    I don’t think that constitutes a reason for including anything into the
    Unicode Standard. Expanding the standard with new “characters” for such
    reasons would have no end. It is also a very impractical approach. It surely
    takes much more time to have a large set of characters added to Unicode and
    at least some fonts and supported by relevant software than to just fix some
    software that now has difficulties with, say, superscripting.

    > The superscripted x is not available in UNICODE,

    There is a superscripted x, namely U+02E3, MODIFIER SMALL LETTER SMALL X,
     “ˣ”. I don’t know why it has been included and what it is used for, but I
    would guess it is used in some phonetic notations, or maybe in the writing
    system(s) of some small language(s). Technically it is defined as having the
    compatibility decomposition “<super> 0078”, i.e. letter “x” in superscript
    style, but this does not imply that it would be wise to use as superscript x
    in general, e.g. when expressing exp(x) as e raised to the power x. One
    practical reason to this is that you should expect font designers to draw
    the character so it its design matches its intended scope of usage. Another
    practical reason is that its font support is relatively limited, thereby
    limiting your typographic design as a whole.

    -- 
    Yucca, http://www.cs.tut.fi/~jkorpela/ 
    


    This archive was generated by hypermail 2.1.5 : Sat Mar 20 2010 - 02:32:47 CST