Re: apostrophes

From: Asmus Freytag (
Date: Mon May 22 2006 - 15:44:06 CDT

  • Next message: Adam Twardoch: "Re: apostrophes"

    On 5/21/2006 9:47 PM, Jukka K. Korpela wrote:
    > On Sun, 21 May 2006, Kent Karlsson wrote:
    >> Because it is in ISO/IEC 8859. Hadn't ISO/IEC 8859-1 been so
    >> commonly supported, MICRO SIGN would have been canonically
    >> equivalent with GREEK SMALL LETTER MU.
    > Why? How did the common support to ISO/IEC 8859-1 dictate the decision
    > that was made?
    Without such common support, there would not have been a need to exactly
    clone the sequence of 256 code positions from 0000 to 00FF.

    Without that need, there would have been no reason to not simply have a
    single character for both mu and micro.

    > It would have been possible to make these characters canonically
    > equivalent or even the same, although it would have been somewhat odd
    > to have a Greek letter in the Latin 1 Supplement block and a
    > corresponding hole in the Greek block.
    Precisely, and that's something that would have not been acceptable to
    the Greek national member body of ISO/IEC JTC1/SC2. Had anyone attempted
    that, the Greek vote would have been negative on ISO/IEC 10646's
    alignment with Unicode. Incidentally, for those who forget history, the
    standard needed every single vote at that crucial point. That's the
    reason behind a number of design decisions that appear odd without
    taking into account the political dimension of securing acceptance of an
    un-proven standard.
    > As far as I can see, it was a practical decision. It looks like a
    > natural decision to me, since the glyphs for these characters may well
    > be different, and the MICRO SIGN can be seen as a special symbol
    > historically based on GREEK SMALL LETTER MU rather than just its
    > specialized usage.
    I think the evidence is still in favor of viewing it as specialized
    usage, but the disunification does indeed allow relatively
    straightforward support for divergence in form.
    >>> The present justification is that U+00B5 does not belong to
    >>> any script, whereas U+03BC is in the Greek script.
    >> That's a mistake. It should be in the Greek script, of course,
    > U+00B5 has the Script value of Common, which might perhaps more
    > appropriately be characterized as belonging to _any_ script rather
    > than not belonging to any script. What its script _should_ be is less
    > obvious, but since it is only compatibility equivalent to a Greek
    > letter, the current situation looks natural. Similarly, for example,
    > ALEF SYMBOL is in the Common script, not in the Hebrew script.
    And very appropriately so.
    >> just like the OHM SIGN (which is canonically equivalent with
    > The OHM sign _is_ in the Greek script, and this is apparently based on
    > its being _canonically_ equivalent to a Greek letter (which was a
    > somewhat odd decision, but let's not go into that now).
    Coding the OHM sign separately was something that comes from East Asian
    character sets, some of which support both a code point for a symbol and
    a separate code point for the character of the Greek script. Carrying
    over that usage (and the similar usage of A with ring in contrast to
    Angstrom) was felt to be a mistake and asserting canonical equivalence
    was seen as the best way to discourage any future differentiation
    between the halves of each pair.
    >> the latter of course the preferred
    >> character to denote the ohm unit symbol,
    > Yes, there is an explicit statement about that in the Unicode standard.
    >> just like GREEK SMALL
    >> LETTER MU is the preferred character for denoting the
    >> micro unit prefix symbol).
    > Have you found such a statement, or even an implicit preference, in
    > the Unicode standard, or some other standard? (Unfortunately,
    > standards related to the SI, as many other standards, define the use
    > of characters without identifying them by Unicode numbers or names.
    > Historically, this is understandable, but it creates considerable
    > vagueness in some cases.)

    If your Unicode to 8859-1 mapping table supports mapping Greek mu to
    micro sign, as well as the reverse, it would probably be preferable to
    use the mu consistently. However, that would break any software that was
    migrated to Unicode by straight 'widening' of code points and which
    might have a numerical constant identifying the micro sign.

    The prevalence of such software may be diminishing now that Unicode has
    been around for a while, in favor of software that is written with
    Unicode in mind. If so, a gradual move to support the mu as the
    preferred character would be possible (as long as a provision is made to
    recognize the micro sign in existing data).

    On the other hand, millions of existing Latin-1 keyboards will continue
    to support the micro sign
     while a smaller number of Greek keyboards will not.

    In a perfect world that issue would not exist, but in the real world,
    each transition has both benefits and costs. This small discrepancy is
    part of the costs of adopting a universal character set that needs to
    function compatibly vis-a-vis existing devices, data and software.


    This archive was generated by hypermail 2.1.5 : Mon May 22 2006 - 15:51:19 CDT