Re: sub and superscripts

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Jul 05 2005 - 14:40:18 CDT

  • Next message: Dean Snyder: "Re: Arabic encoding model (alas, static!)"

    > Mark E. Shoulson wrote:
    > > Gregg Reynolds wrote:
    > >
    > >> ... but I would observe that Unicode is
    > >> capable of accomodating e.g. bidi-override marks and various similar
    > >> "characters"; so why not a <subscript> and <popsubscript> mark, for
    > >> example.

    And the answer to that is that unlike the bidirectional case, the
    introduction of subscript formatting operators into Unicode would
    do nothing whatsoever to improve upon existing text representation,
    and would, in fact, have the contrary impact of introducing confusion
    and ambiguity into text.

    > >>
    > > We could call them "[" and "]", for consistency with existing practice.
    > > This is a typesetting issue, not a plaintext one.
    > >
    > I'm not sure I understand what you mean. "[" and "]" have well-defined
    > plaintext semantics. If we overlay "subscript" on those semantics, then
    > we no longer have plaintext. Nor do we have markup; we have a
    > redefinition of the codepoints.

    Actually, none of the above. What we have is different conventions
    for usage of (plain) text.

    The text "b[i]", represented (in Unicode) by the plain text
    sequence <0062, 005B, 0069, 005D>, may, in one context, with
    one set of textual conventions, be interpreted as "the i-th element
    of the array b". In another context, with another set of
    textual conventions, it may be interpreted as "the word 'bi',
    with the existence of the letter 'i' inferred, although missing
    in the original epigraph (or other physical source)."

    In the first context, a commonly used typographical convention
    is to represent the same concept with an math italic b and a
    subscripted i. In the second context, that presentation would
    not be equivalent. At any rate, there is no presumption that
    the mathematical presentation should be automatically derived
    from plain text without the imposition of *some* level of
    markup as well -- since mathematical typesetting is, in general,
    complex.

    > Also I don't understand the distinction
    > between "typesetting issue" and "plaintext issue". Plaintext must be
    > typeset.

    Plain text must be *rendered* legibly to be read (at least in anything
    other than a debugger). That does not mean that all typographical
    issues for text become, ipso facto, plain text issues.

    There are issues in the representation of text that fall outside
    the realm generally considered to be appropriate to the encoding
    of the plain text elements represented by "characters" in the
    Unicode Standard.

    And there is general consensus now within the character encoding
    community that superscripting and subscripting falls outside
    that boundary. The exceptions you see in the Unicode Standard
    are either a) for legacy compatibility with older characters
    sets or b) special purpose characters for representation of
    letters in technical phonetic representations.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue Jul 05 2005 - 14:42:23 CDT