From: Jim Allan (jallan@smrtytrek.com)
Date: Mon Aug 18 2003 - 18:41:08 EDT
Peter Kirk posted:
> Well, that's what was puzzling me about the recommendations not to use
> these characters. In my opinion, there needs to be a clear statement
> with each character definition (not somewhere in the text not linked to
> it) of its status in such respects. Is it for compatibility use only? Is
> it a presentation form not for use in general information interchange?
> Is it a formatting variant of another character, which should be used if
> that special formatting is to be indicated although the two might be
> collated together?
Perhaps a cross-reference to areas in the main text where that
particular character or kind of character is discussed when there is
some special mention in the main text.
Otherwise the various indications of distinction and compabitility
decomposition and canonical decomposition usually indicate a lot, if
the reader looks at them and learns to understand them.
But indeed the standard is somewhat inconsistant in sometimes coming
close to recommending not using compatibility characters at all and in
other cases recommending particular ones.
> For example, if I want a superscript 2 to indicate "squared" (which
> someone used on this list recently), am I supposed to use U+00B2, or
> should I avoid using it and instead use a higher level markup (which
> implies I need to use HTML e-mail)? Maybe the text tells me somewhere,
> but it certainly doesn't in the code chart.
Well if you are using unformatted text and want to use a superscript 2
then you don't have much choice. I suppose I could have sent "E=mc^2" or
"E=mc{squared}" "E=mc<super>2" or something, but why would I when I have
Unicode? :-)
Actually superscript 2 is also in the Latin-1 character set. :-)
In http://www.unicode.org/versions/Unicode4.0.0/ch14.pdf it states:
<< Therefore, the preferred means to encode superscripted letters or
digits, such as “1st” or “DC0016”, is by style or markup in rich text. >>
I would think that statement obvious since in technical writing and
mathematical writing it is theoretically possible for any displayable
character in Unicode to be superscripted or subscripted, and even
superscripted or subscripted to an already superscript or subscript
character, and so on.
Also in the code chart (http://www.unicode.org/charts/PDF/U0080.pdf)
U+00BS SUPERSCRIPT TWO is given a compatibility decomposition to
"<super> *0032* 2". Similarly with other superscript characters.
But beyond all recommendations in the Unicode standard what is done
depends on what the user wants to do for a particular purpose in a
particular environment with particular fonts. There is no one correct
way that fits all users at all places and times, nor should there be.
If I am printing out a document on a particular system with particular
software and fonts in which plain text superscripts look to me better
than superscripts created by formatting regular numbers by the word
processor I am using then I will naturally in that time and place use
Unicode plain text superscripts.
That Unicode gives me the choice is a benefit I should take advantage of
without worrying that formatting regular numbers as superscript is
theoretically better than using compatibility characters.
Unicode is messy and complex mostly because character usage is messy and
complex and display technology is messy and complex and there are always
edge-cases and things that don't fit well.
But Unicode's keeping deprecated individual character encodings while
allowing applications to freely throw away non-deprecated canonical
decomposable encodings (which supposedly only exist because they should
not be thrown away) confuses me also.
> I thought even deprecated ones were supposed to be usable, in that a
> system should process them correctly.
It depends on what is meant by "usable" and the "system" and
"correctly". No system has to support all of Unicode. Accordingly I
would not expect systems to support deprecated control characters or
fonts to go out of their way to support deprecated characters.
A system that does not support deprecated control codes (and even some
of the non-deprecatated control codes) and does not support particular
characters (perhaps only because there are no fonts on the system with
those characters) can still be conformant to Unicode in what it supports.
A text editor that supports only fixed width fonts will probably not
support the special-width spaces properly but may still be Unicode
conformant.
Jim Allan
This archive was generated by hypermail 2.1.5 : Mon Aug 18 2003 - 19:10:45 EDT