From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Jun 29 2008 - 22:56:11 CDT
David Starner wrote:
> But the terminal is not remotely a plain text application. It
> already handles a wide variety of formatting, like bold and
> italics, and there's absolutely no reason you couldn't add
> subscript and superscript, or even full Tex-like markup.
> Extending plain text is frequently not the right way to
> attack a problem.
Exactly!
In fact as soon as you start extending Unicode for what it is not, you'll
immediately realize that you'll then need to reencode subscript and
superscript variants of almost all existing ''normal'' character base
characters; then you'll have to do the same for other font variants. For all
this use markup language.
This just proves that superscript and subscripts are just provided for
compatibility only, and that without this need they should have never been
encoded, including for plain-text where other linear notations/conventions
would have been used instead (for example "5.1e22" commonly used instead of
"5.1×10²²" or "10 km^2" instead of "10 km²").
And you'll also need more superscript and subscript levels (for this use,
notations like TeX or MathML can be transported in plain text by using their
conventional syntax). Plain text is not made to transport the text layout,
just the basic semantic; for the rest you need some other convention,
notation, or higher protocol... This is just like in natural written
languages, with their conventional orthographies, that Unicode is also not
encoding: otherwise we would need the encoding of a separate Altaic alphabet
for Turkish, a Latin alphabet for English, another Latin alphabet for German
with the special handling of umlauts (at linguistic level only) like
vowels...
So there's really no end to the desire to encode contextual variants as new
characters. As the needs fo variants is orthogonal to the need of supporting
a large set, the only safe way is effectively to not encode contextual
variants, as most as possible, but only the common abstract characters, and
decide that layout and style information is not part of the standard and
will require another higher-order protocol.
We can easily realize that, as a general rule, if two uses of some
characters carry the same visual value and interpretation when seen out of
their context where they may appear, and if they can obey to the same
composition rules in arbitrary layouts, then they have to share the same
encoding as abstract characters even if they have several distinctive
contextual realizations. Superscripts and subscripts for example are not
different from normal script if seen isolately: there's just a different of
default size or position but even the text size and position is not encoded
in any character itself and they remain reasable and meaningful even in this
context.
The layout may add additional information by itself independantly of the
context neutral semantic of the plaint text characters that they are
augmenting. If you are converting a text with layout to plain text and
completely drop the layout information without converting it to some
notation, this is where you may loose or change the semantic. For example
when converting "10²" to "102": this is not the fault of Unicode, it's just
your fault for not introducing and conveying some alternative notation like
"10^2" and explicitng in you plain text conventions that this notation is
used or by specifying it as meta-information parallel to the transmission of
the text itself.
(Note that the encoded modifier letters and IPA symbols are NOT true
superscripts as they are really meant as distinctive elements where the
choice of the borrowed letter is quite arbitrary): they can't be used to
write arbitrary words written with the Latin alphaber for example, and they
are not necessarily designed to properly line-up on their superscript
baseline. To write regular words or even full sentences in superscript, use
some conventional notation (like punctuation) or layout
structure/syntax/protocol, but encode the words themselves using the regular
letters and everyone will be happy.
This archive was generated by hypermail 2.1.5 : Mon Jun 30 2008 - 10:08:24 CDT