Re: Does Unicode 4.1 change NFC?

From: Peter Kirk (
Date: Mon Apr 04 2005 - 15:27:50 CST

  • Next message: Kenneth Whistler: "Re: Does Unicode 4.1 change NFC?"

    On 04/04/2005 19:02, Kenneth Whistler wrote:

    >Peter Kirk continued:
    >>In that case these character allocations seem perverse, given that both
    >>of these characters could have been assigned to the BMP, or both to
    >>outside it
    >Perverse it may be, but there is no point in casting implied
    >asperversions at the UTC.

    Well, I didn't name the UTC, but thanks for the clarification.

    >Crying "security hole!" seems to be the Fad Of The Month on the
    >Unicode list, but this isn't one of them.
    >In any conformant Unicode 4.0.1 (or earlier) version of normalization,
    >U+FACF normalizes to (tada!) U+FACF. If it doesn't, the normalizer
    >isn't conformant. If sending U+FACF to such a normalizer crashes
    >an application, then shame on the programmer.

    The problem will of course come when new UCD data is fed into an old
    normaliser. You have made much in the past of the need not to change the
    normalisation algorithm, not to add new classes of exceptions etc so
    that programs don't have to be rewritten for each new version, only the
    data needs to be updated. The sort of outcome I might well expect to see
    from this is a normaliser emitting surrogate pairs in UTF-8 or UTF-32.

    >In any conformant Unicode 4.1.0 version of normalization, U+FACF
    >normalizes to U+2284A. If it doesn't, the normalizer isn't
    >conformant. If sending U+FACF to such a normalizer crashes
    >an application, then shame on the programmer.

    Well, however much I say "shame on the programmer" who wrote programs
    which allowed all those nasty viruses and worms of a couple of years ago
    to spread (I don't mean the virus etc programmer, but the mail client
    etc programmer), that doesn't change the fact that that they cost
    various people millions of dollars.

    >There is a very good set of normalization test data available for
    >both Unicode 4.0.0 and now for Unicode 4.1.0. Anyone who puts
    >out an implementation of normalization that cannot pass the
    >appropriate version test deserves what they get.

    Indeed everyone should test their programs extensively for each new
    version. But will they? And if they don't, do their customers deserve
    what they get?

    >In neither case is this a security hole *caused* by the allocation.
    Fair enough, but it is potentially laid open by the allocation. Programs
    can be a bit like minefields, full of bugs which might blow up on you at
    any time. Careful sweeping of the commonly used parts of the
    multidimensional data space has cleared out the bugs which are most
    likely to cause trouble. But in areas off the beaten track lurk
    unexploded bugs (to mix a metaphor), ready to blow up in your face as
    soon as you feed in novel kinds of data which cause the program to
    follow untested paths. That is the danger here.

    Peter Kirk (personal) (work)
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.308 / Virus Database: 266.9.1 - Release Date: 01/04/2005

    This archive was generated by hypermail 2.1.5 : Mon Apr 04 2005 - 15:28:29 CST