Re: Clones (was RE: Hexadecimal)

From: Jim Allan (
Date: Mon Aug 18 2003 - 18:41:08 EDT

  • Next message: "RE: Clones (was RE: Hexadecimal)"

    Peter Kirk posted:

    > Well, that's what was puzzling me about the recommendations not to use
    > these characters. In my opinion, there needs to be a clear statement
    > with each character definition (not somewhere in the text not linked to
    > it) of its status in such respects. Is it for compatibility use only? Is
    > it a presentation form not for use in general information interchange?
    > Is it a formatting variant of another character, which should be used if
    > that special formatting is to be indicated although the two might be
    > collated together?

    Perhaps a cross-reference to areas in the main text where that
    particular character or kind of character is discussed when there is
    some special mention in the main text.

    Otherwise the various indications of distinction and compabitility
    decomposition and canonical decomposition usually indicate a lot, if
    the reader looks at them and learns to understand them.

    But indeed the standard is somewhat inconsistant in sometimes coming
    close to recommending not using compatibility characters at all and in
    other cases recommending particular ones.

    > For example, if I want a superscript 2 to indicate "squared" (which
    > someone used on this list recently), am I supposed to use U+00B2, or
    > should I avoid using it and instead use a higher level markup (which
    > implies I need to use HTML e-mail)? Maybe the text tells me somewhere,
    > but it certainly doesn't in the code chart.

    Well if you are using unformatted text and want to use a superscript 2
    then you don't have much choice. I suppose I could have sent "E=mc^2" or
    "E=mc{squared}" "E=mc<super>2" or something, but why would I when I have
    Unicode? :-)

    Actually superscript 2 is also in the Latin-1 character set. :-)

    In it states:

    << Therefore, the preferred means to encode superscripted letters or
    digits, such as “1st” or “DC0016”, is by style or markup in rich text. >>

    I would think that statement obvious since in technical writing and
    mathematical writing it is theoretically possible for any displayable
    character in Unicode to be superscripted or subscripted, and even
    superscripted or subscripted to an already superscript or subscript
    character, and so on.

    Also in the code chart (
    U+00BS SUPERSCRIPT TWO is given a compatibility decomposition to
    "<super> *0032* 2". Similarly with other superscript characters.

    But beyond all recommendations in the Unicode standard what is done
    depends on what the user wants to do for a particular purpose in a
    particular environment with particular fonts. There is no one correct
    way that fits all users at all places and times, nor should there be.

    If I am printing out a document on a particular system with particular
    software and fonts in which plain text superscripts look to me better
    than superscripts created by formatting regular numbers by the word
    processor I am using then I will naturally in that time and place use
    Unicode plain text superscripts.

    That Unicode gives me the choice is a benefit I should take advantage of
    without worrying that formatting regular numbers as superscript is
    theoretically better than using compatibility characters.

    Unicode is messy and complex mostly because character usage is messy and
    complex and display technology is messy and complex and there are always
    edge-cases and things that don't fit well.

    But Unicode's keeping deprecated individual character encodings while
    allowing applications to freely throw away non-deprecated canonical
    decomposable encodings (which supposedly only exist because they should
    not be thrown away) confuses me also.

    > I thought even deprecated ones were supposed to be usable, in that a
    > system should process them correctly.

    It depends on what is meant by "usable" and the "system" and
    "correctly". No system has to support all of Unicode. Accordingly I
    would not expect systems to support deprecated control characters or
    fonts to go out of their way to support deprecated characters.

    A system that does not support deprecated control codes (and even some
    of the non-deprecatated control codes) and does not support particular
    characters (perhaps only because there are no fonts on the system with
    those characters) can still be conformant to Unicode in what it supports.

    A text editor that supports only fixed width fonts will probably not
    support the special-width spaces properly but may still be Unicode

    Jim Allan

    This archive was generated by hypermail 2.1.5 : Mon Aug 18 2003 - 19:10:45 EDT