Re: Ambiguity and disunification

From: Peter Kirk (
Date: Thu Mar 03 2005 - 05:06:54 CST

  • Next message: Asmus Freytag: "Re: But E0000 Custom Language Tags Are Actually *Required* For Use By Unicode"

    On 03/03/2005 04:41, Jony Rosenne wrote:

    >With Hyphen-Minus Unicode did right - there are separate hyphen and minus

    Jony and Gregg, perhaps it would help you to understand this if we
    consider the less obscure Greek and Coptic situation.

    A "Greek and Coptic" alphabet has been encoded in Unicode since its
    early days. In Unicode 4.1 (I think) a separate Coptic alphabet will be
    added, but not a separate Greek alphabet. The old "Greek and Coptic"
    characters will continue to be used for Greek. Let us call the old Greek
    and Coptic characters GC, and the new Coptic ones C.

    In Unicode 4.0, any GC character is unambiguously ambiguous. In other
    words, apart from any context it can certainly be interpreted as
    ambiguous between Greek and Coptic.

    As from Unicode 4.1, a C character will be unambiguously Coptic, but
    there is a new uncertainty with a GC character: is it from legacy data
    which is ambiguous between Greek and Coptic, or is it from new data
    which is unambiguously Greek? Such uncertainty affects spelling checkers

    The new uncertainty could have been resolved, and this seems to be
    Dean's and Jony's preferred approach in principle, by adding a new
    alphabet of Greek only characters G. These characters would indeed have
    been unambiguous, but at what price? The current Greek and Coptic
    characters are in widespread use in Greek text in Greece and Cyprus, as
    well as by speakers and scholars of Greek worldwide. In comparison, the
    use of Coptic is minuscule (and I don't mean in the typographic sense).
    What would have been served by introducing a new set of unambiguous
    characters? Considering how little actual use of Coptic there has been,
    almost nothing. But to achieve this there would be a need for massive
    disruption for existing users of Greek. There is also a huge store of
    existing text using GC characters which will continue unchanged. As such
    text would need to be searched alongside text using the new G
    characters, for the indefinite future, search etc processes would need
    to treat GC and G characters as equivalent, which would largely defeat
    the object of encoding the separate G characters.

    So this is a case where practicality needs to take precedence over what
    some might consider to be theoretically preferable. And in my judgment
    the same applies to the QAMATS and HOLAM disunifications, where there is
    also a large body of existing text using the old character, and the
    relative proportion of use of the new character is tiny.


    >Not in this case. But we are told that the presence of Qamats Qatan in the
    >text means that any Qamats in it is a Qamats Gadol.
    No, it does not mean this. For better or for worse, the situation seems
    to be that the old qamats character will continue to be ambiguous in any
    context. In this case, it seems to be for the better, because the great
    majority of users want to continue to use the old qamats character
    ambiguously, and the distinct qamats qatan is for use only by a few
    people who see a special need to make the distinction explicit.

    Peter Kirk (personal) (work)
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.308 / Virus Database: 266.6.0 - Release Date: 02/03/2005

    This archive was generated by hypermail 2.1.5 : Thu Mar 03 2005 - 05:08:18 CST