Re: Clones (was RE: Hexadecimal)

From: Jim Allan (
Date: Mon Aug 18 2003 - 12:06:37 EDT

  • Next message: Noah Levitt: "Re: Vi problem"

    Jill Ramonsky posted:

    > I would really like it if these, and
    > every single other character which is "only there for reasons of round trip
    > compatibility" with something else, were explicity marked in the
    > machine-readable charts with something meaning "Don't introduce this
    > character, at all, ever. Don't try to interpret it. Just preserve it, in
    > case it ever gets turned back to its original character set".

    That would probably be too strong.

    If characters are available then some people will use them. :-(

    See section 2.3 at

    Unicode 3.0 contained under section D21 on compatibility characters:

    << Their use is discouraged other than for legacy data. >>

    I don't know whether this statement was intentionally removed was
    accidently dropped in the changes in 4.0 which distinguish
    "compatitiblity character" from "compatibility composite character".

    In any case people can't be prevent from doing things that are
    officially discouraged, especially as for some particular use it might
    be wrong to discourage them. So if you are handling Roman numerals in an
    application and wish your handling to be complete then unfortunately you
    do have to take the compatibility Roman numerals into account.

    > U+2212 (minus sign) - an obvious clone of U+002D (hyphen-minus). Who
    > uses this?

    People concerned with proper appearance of the symbol in proportional
    fonts. Almost all proportional fonts use a narrow hyphen dash rather
    than a minus-width dash for the hyphen-minus character. In some
    older-style fonts it is even a slanting character.

    See in 6.2 for a
    detailed discussion of the various dash characters.

    > U+2217 (asterisk operator) - an equally obvious clone of U+002A
    > (asterisk)

    They look much the same in a typewriter style font. They don't do so in
    proportional fonts where the regular asterisk tends to appear somewhat
    like a superscript.

    Unicode provides support both for good typographical usage as well as
    traditional data-processing typographical usage based based on
    typewriter technology.

    > U+223C (tilde operator) - a clone of U+007E (tilde)

    See and look for
    "Spacing Clones of Diacritics".

    The ASCII tilde was originally intended to be a non-spacing diacritic
    tilde to be applied to other characters by backspace. In part because of
    the low resolution of many early data-processing printers it was often
    realized in a tilde operator form. That has now become its most normal
    form in fonts.

    But for good typography you do want to distinguish them and the
    overloading of tilde as ASCII 7E means that a font may render a
    mathemtical full-character tilde when you want to show a diacritic or
    render a spacing diacritic when you wanted a mathematical operator.

    Unicode is intended for typesetting applications as well as entering
    computer code in a traditional typewriter style character set with
    typewriter limitations.

    > and then there's
    > U+2223 (divides) - hell, that looks to me remarkably like U+007C
    > (vertical line)

    The do look close. But U+007C usually extends below the base line and
    and U+2223 usually doesn't.

    > For example:
    > U+2264 (less than or equal to) - compare with U+2A7D (less than or
    > slanted equal to)

    I have no idea. You will probably have to ask the MathML people about
    that one. See
    Mathematicians seem to think they need to distinguish the two.

    As a non-mathematician I find many of these distinctions bewildering and
    seemingly only typographical. But if mathematicians in some field make
    fine distinctions based on such differences then it is important that
    Unicode allow such distinctions to be maintained in plain text.

    > In defence of this argument, I point out that the
    > complementary relation, NOT equal to, has codepoint U+2270, and this is
    > represented in the code charts as having a slanted equal to, so it OUGHT to
    > be the complement of U+2A7D. (Unless I've missed it, there appears to be no
    > "not equal to with horizontal equals" character).

    The chart at does not show a
    slanted equals.

    For some discussion of the math symbols see also

    Part of the problem is that differences that are in most environments
    only typographical style differences may indicate semantic differences
    in particular disciplines. It is impossible to establish a firm line as
    to how important or common would would normally be a stylistic variation
    must be before it should be encoded in Unicode for plain text distinctions.

    For example open-loop _g_ is distinguished from close-loop _g_ in the
    International Phonetic Alphabet and so Unicode encodes it separately at

    A normal Latin Letter font would probably not have U+0261 in it at all
    and might display U+0067 with either closed or open loop. But a font for
    phonetic use should always display U+0067 with a closed loop.

    Fonts like Arial Unicode MS lose the distinction.

    For non-technical use people need not and mostly quite rightly will not
    use the more technical symbols to make fine distinctions that don't
    apply in their particular usage.

    Jim Allan

    This archive was generated by hypermail 2.1.5 : Mon Aug 18 2003 - 12:36:47 EDT