From: Peter Kirk (firstname.lastname@example.org)
Date: Thu Mar 03 2005 - 05:06:54 CST
On 03/03/2005 04:41, Jony Rosenne wrote:
>With Hyphen-Minus Unicode did right - there are separate hyphen and minus
Jony and Gregg, perhaps it would help you to understand this if we
consider the less obscure Greek and Coptic situation.
A "Greek and Coptic" alphabet has been encoded in Unicode since its
early days. In Unicode 4.1 (I think) a separate Coptic alphabet will be
added, but not a separate Greek alphabet. The old "Greek and Coptic"
characters will continue to be used for Greek. Let us call the old Greek
and Coptic characters GC, and the new Coptic ones C.
In Unicode 4.0, any GC character is unambiguously ambiguous. In other
words, apart from any context it can certainly be interpreted as
ambiguous between Greek and Coptic.
As from Unicode 4.1, a C character will be unambiguously Coptic, but
there is a new uncertainty with a GC character: is it from legacy data
which is ambiguous between Greek and Coptic, or is it from new data
which is unambiguously Greek? Such uncertainty affects spelling checkers
The new uncertainty could have been resolved, and this seems to be
Dean's and Jony's preferred approach in principle, by adding a new
alphabet of Greek only characters G. These characters would indeed have
been unambiguous, but at what price? The current Greek and Coptic
characters are in widespread use in Greek text in Greece and Cyprus, as
well as by speakers and scholars of Greek worldwide. In comparison, the
use of Coptic is minuscule (and I don't mean in the typographic sense).
What would have been served by introducing a new set of unambiguous
characters? Considering how little actual use of Coptic there has been,
almost nothing. But to achieve this there would be a need for massive
disruption for existing users of Greek. There is also a huge store of
existing text using GC characters which will continue unchanged. As such
text would need to be searched alongside text using the new G
characters, for the indefinite future, search etc processes would need
to treat GC and G characters as equivalent, which would largely defeat
the object of encoding the separate G characters.
So this is a case where practicality needs to take precedence over what
some might consider to be theoretically preferable. And in my judgment
the same applies to the QAMATS and HOLAM disunifications, where there is
also a large body of existing text using the old character, and the
relative proportion of use of the new character is tiny.
>Not in this case. But we are told that the presence of Qamats Qatan in the
>text means that any Qamats in it is a Qamats Gadol.
No, it does not mean this. For better or for worse, the situation seems
to be that the old qamats character will continue to be ambiguous in any
context. In this case, it seems to be for the better, because the great
majority of users want to continue to use the old qamats character
ambiguously, and the distinct qamats qatan is for use only by a few
people who see a special need to make the distinction explicit.
-- Peter Kirk email@example.com (personal) firstname.lastname@example.org (work) http://www.qaya.org/ -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.6.0 - Release Date: 02/03/2005
This archive was generated by hypermail 2.1.5 : Thu Mar 03 2005 - 05:08:18 CST