Re: Normalisation stability, was: Compression through normalization

From: Peter Kirk (
Date: Tue Nov 25 2003 - 14:07:48 EST

  • Next message: Doug Ewell: "Re: Normalisation stability, was: Compression through normalization"

    On 25/11/2003 10:03, John Cowan wrote:

    >... And as for
    >canonical equivalence, the most efficient way to compare strings for
    >it is to normalize both of them in some way and then do a raw
    >binary compare. Since it adds efficiency to normalize only once,
    >it is worthwhile to define a few normalization forms and urge
    >people to produce text in one of them, so that receivers need not
    >normalize but need only check for normalization, typically much cheaper.
    If receivers are expected to check for normalisation, they are
    presumably expected also to normalise if the check fails; if they do
    not, they are in conflict with conformance clause C9 - at least with the
    "ideally" of the last paragraph and probably with the principle "no
    process can assume that another process will make a distinction between
    two different, but canonical-equivalent character sequences.". The
    efficiency gain is because it is expected that the great majority of
    received strings are already normalised. But the system must be able to
    cope with a small proportion of non-normalised strings. And so if
    combining classes are changed in such a way that the normalised form of
    certain rare or anomalous strings is not preserved, the system can cope.
    And thus the argument from normalisation stability against changing
    combining classes also fails, at least where those changes are made to
    rare or obscure characters, or combinations of characters, which are
    little used in existing texts. One example, if Doug will forgive me, is
    Hebrew points. There may well be others.

    So, it seems that Unicode has bound itself by its stability policy to
    something which is both unnecessary and in fundamental conflict with its
    own conformance clause C10. I urge reconsideration of the policy.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Tue Nov 25 2003 - 14:53:14 EST