Tim Partridge commented:
> In message <9711281918.AA03595@unicode.org> Doug Ewell recently said:
> > As I read this, I keep thinking about U+03C2 GREEK SMALL LETTER FINAL
> > SIGMA and U+03C3 GREEK SMALL LETTER SIGMA. If I am not mistaken,
> > these are indeed just presentation variants, and there is indeed a
> > straightforward rule (end-of-word) to determine which glyph should be
> > displayed. So, strictly speaking, this principle would seem to point
> > to the unification of U+03C2 and U+03C3 (and several similar pairs in
> > the Hebrew block, for that matter).
> There is a straight forward rule for Greek, as you state. But applying
> this as a default could cause problems when Greek letters are used
> in mathematics. Sigma is used in statistics to represent standard
> deviation. I'm surprised Unicode doesn't have a separate code point
> for this considering its obsession with U+2126 micro sign, U+00B5 ohm sign,
> U+2135 alef symbol (Hebrew, not Greek) etc. Perhaps it's because the others
> are designators / constants rather than variables.
It doesn't have anything to do with their mathematical function, but
rather with the source encoded character sets that were in consideration
when the original collection of characters was pulled together.
U+00B5 MICRO SIGN is in ISO/IEC 8859-1 ("Latin-1"), and in the preexisting
ISO nomenclature was distinguished from U+03BC GREEK SMALL LETTER MU. There
was no choice but to provide for separate encoding in Unicode.
The encoding for Greek Sigma is dependent on the preexisting encoding of
ISO/IEC 8859-7, as well as the Greek encodings that standard is derivative
U+2126 OHM SIGN and U+2135 ALEF SYMBOL, and other letterlike symbols that
ended up with separate encodings also had preexisting separate treatments
in at least one source encoded character set.
The implication that Unicode has an obsession for making such distinctions
on their functional status is incorrect. If anything, a careful reading
of UnicodeData-2.0.14.txt, available on the website, will demonstrate that
the letterlike symbols which otherwise appear to be identical to existing
letters in an encoded alphabet are all given canonical equivalences to
those other letters. This is the UTC's way of dealing with the fact that
we have to live with the dual encodings, for compatibility with other
encoded character sets, while realizing that cloning such characters based
on mathematical (or other) function is not a good idea.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT