RE: Case mapping errors?

From: Karlsson Kent - keka (keka@im.se)
Date: Thu Jun 22 2000 - 06:06:41 EDT


(This message is send in UTF-8. Flames regarding that fact
will be deleted without response.)

No, those case mappings are not in error. Nor are their
canonical mappings in error. (The MICRO SIGN would have
had a canonical mapping to Greek mu, if it had not been
included in such much-used repertioires as Latin-1.)

For the PROSGEGRAMMENI it's my understanding that it is
customary (in e.g. dictionaries) to capitalise it the way
it is done in Unicode. (But I don't know classical Greek.)

The MICRO, OHM, KELVIN, and ANGSTROM (ÅNGSTRÖM, really) SIGNs
are included in Unicode for compatability reasons only. You
should not use them but use the characters that they canonically
(or 'near canonically' in the case of MICRO SIGN) decompose to.
Note that there are many (SI or other) unit names that are *not*
included as separate characters, like symbols for Watt, Volt, etc.
Nor is there any need to include them. Those symbols are just
letters reused for unit symbols. The case mappings for these
signs derive from the characters that they (near) canonically
map to. It's true that you should never case change a unit
symbol or unit prefix symbol, but goes also for W, V, m, M, etc.
too, even though those can only be represented by "LETTER"
characters.

As far as I know, the inclusion of the MICRO and OHM signs
derive from their inclusion in repertiores that otherwise
contain only Latin letters (and punctuation), but apparently
someone found these Greek letters important enough (for use
in writing unit designations) to include those two Greek letters
with a name reflecting why they where included. This does
not remove the fact that they are just ordinary Greek letters
really. For the Kelvin and Ångström I can only speculate as
to why they where included in a source (for Unicode) Korean
encoding. My theory is that the Kelvin sign started out as
a DEGREE KELVIN in analogy with the DEGREE CELSIUS and
DEGREE FAHRENHEIT signs (which have a (small) justification as
ligatures, especially in CJK typography), until someone pointed
out that its not called (nor written) "degree Kelvin" but just
"Kelvin". My theory about the Ångström sign's original inclusion
in that Korean encoding is that someone might have thought
that the A with a ring was not just a letter, but some
special invented symbol (easy mistake to do if you only know
that unit as "angstrom"). It's not a specially invented symbol,
it's just the first letter in Mr Ångström's name, just like for
Watt, Volt, Kelvin, ...

The case mappings are correct, but you should never apply
any case mapping to unit symbols that are letters. Getting
software to "understand" what is a unit symbol (without
special markup) and what is not might be tricky when the
unit symbols are written with letters (as all SI, except for
the degree symbol, and many other units are)... And no,
reincluding all letters (or letter combinations) as "signs"
for each and every reuse letters have been put to (e.g.
unit signs) is not an appropriate solution.

Please, never use those "SIGN"s, except when mapping those
letters to character repertoires which do not contain the
proper Greek letters, but do contain those "SIGN"s. Nor
should you use any of the other "squared" unit characters,
except when you absolutely have to get the "squared"
typographic effect (but that's ugly in my eyes) in CJK
typography from plain text. Note still that there are
many (composite) (SI) unit designations that do not have
any "squared" character associated with it. The "squared"
unit characters is a rather random collection, best forgotten.

                Kind regards
                /kent k

> -----Original Message-----
> From: John O'Conner [mailto:John.Oconner@eng.sun.com]
> Sent: Thursday, June 22, 2000 12:15 AM
> To: Unicode List
> Subject: Case mapping errors?
>
>
> There are 5 characters that are giving me a little discomfort
> because of their case mappings:
>
> * U+00B5 MICRO SIGN
> * U+1FBE GREEK PROSGEGRAMMENI
> * U+2126 OHM SIGN
> * U+212A KELVIN SIGN
> * U+212B ANGSTROM SIGN
>
> Each of these have case mappings...and I really don't
> understand why. It
> appears that all of these have no "round-trip" capability to map back
> from another case. I suppose this can be argued for a lot of
> mapppings.
>
> The most difficult cases are 2126, 212A, and 212B. These
> characters are
> "letter-like" in their glyph appearance, but it seems that
> their actual
> semantics are not. It seems like someone may have looked at
> KELVIN SIGN
> for example, decided it looked like a Latin-1 'K' and gave it the same
> lowercase mapping. Still, would you really expect to
> lowercase a KELVIN
> SIGN to a small 'k'. I can't imagine...but I may not be as imaginative
> as some. I have the same argument for OHM SIGN and ANGSTROM SIGN.
> Although they have case mappings, are they expected by most
> people? If I
> were using the OHM, ANGSTROM, or KELVIN SIGN in my work, I
> would be very
> surprised in a case operation changed them...maybe I would be
> disappointed or frustrated even. Are these bugs in the spec? Or do I
> just need to think about them a little differently?
>
> Best regards,
> John O'Conner
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT