Re: Greek Characters Duplicated as Latin from Asmus Freytag on 2011-08-14 (Unicode Mail List Archive)

From: Asmus Freytag <asmusf_at_ix.netcom.com>
Date: Sun, 14 Aug 2011 14:36:14 -0700

On 8/14/2011 1:39 PM, Richard Wordingham wrote:
>
> U+00B5 MICRO SIGN is an ISO-8859-1 character, and was therefore
> included as U+00B5. It normally precedes a Latin-script letter, and
> therefore it actually makes sense to treat it as a Latin-script
> character, and possibly give it a different shape in these contexts to
> the shape of the Greek letter in Greek text.

I don't think that there's a strong and overriding reason to give this
character a separate shape.

As you note, the true reason that this character was encoded separately
has to do with the requirement that the first 256 code points of Unicode
should match 8859-1, so that simply "widening" a byte to 16 or 32 bits
would transform 8859-1 data to UTF-16 or UTF-32. With the predominance
of UTF-8 as format for interchanging Unicode, something that wasn't
foreseen from the beginning, this design criteria has lost slightly in
importance. However, it helped the migration to Unicode, by making
conversion of the vast majority of data (at the time ASCII and 8859-1
accounted for the bulk of existing data on the net) dead simple.

With anything as radically different from its predecessors as Unicode,
keeping as much familiarity as possible was a major concern.

Now, once you list the small mu among the first 256 characters, you then
have to ask the question what to do with the Greek alphabet. The basic
alphabets are used in so many ways in software (for automatic numbering
of headings, etc.) that disrupting this sequence (and leaving out the mu
from the Greek alphabet) wasn't a realistic choice.

Hence, the duplication.

It does not alter the fact, that the "micro sign" really is just a usage
of the Greek small mu, and not actually a new entity.

Because the micro sign was widely implemented in systems and fonts that
do not support the full set of Greek characters, I wouldn't be surprised
to find that there are instances where the design was adjusted to make
it "fit" better in a Latin environment. If so, these developments likely
predate Unicode substantially, because this use of mu was supported in
older technology as well. I recall seeing it on typewriter keyboard
(mechanical).

I'm not sure I agree with the need to have a "Latinized" mu, but it
exists and there you have it. Having two separate code points will allow
these characters to have a separate development in the future.
>
>
>
> U+0216 OHM SIGN is similar to U+00B5 MICRO SIGN, except that it is used
> on its own. Whether it should be merged with U+03A9 GREEK CAPITAL
> LETTER OMEGA is debatable, but that is what has been done.
>
>
The Ohm sign should have been encoded as another example of "squared"
letters and abbreviations. It comes from Asian character sets, where,
inexplicably, it exists separately from and alongside to the capital
Greek Omega - which they also encode.

In order to allow loss-less conversion to/from these sets, there was a
need to have a code point for the "Ohm".

The Omega for Ohm was never as widely used as the mu, and it's
questionable whether there really was much of a development of a
different form for it. The Asian fonts that I knew in the 80's did not
have different forms.

In modern usage, for new documents, this character should not be used.

A./
Received on Sun Aug 14 2011 - 16:38:33 CDT

This archive was generated by hypermail 2.2.0 : Sun Aug 14 2011 - 16:38:34 CDT