Re: Does Unicode 4.1 change NFC?

From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Apr 05 2005 - 03:33:26 CST

  • Next message: Raymond Mercier: "Re: Macrons"

    On 05/04/2005 00:22, Kenneth Whistler wrote:

    >John Burger asked:
    >
    >
    >
    >>>>The problem will of course come when new UCD data is fed into an old
    >>>>normaliser.
    >>>>
    >>>>
    >>>Actually, it will not. If a Unicode normalizer was a Unicode 4.0
    >>>normalizer, it will *stay* a Unicode 4.0 normalizer.
    >>>
    >>>
    >>Even if it is fed new ==UCD== data?
    >>
    >>
    >
    >It depends on what Peter Kirk meant by a "normaliser" and
    >by "UCD data".
    >
    >If by "normaliser" he means a normalizer generator that takes
    >UCD data files as input and generates a normalizer process that
    >corresponds to the version of UCD data files, then of course
    >what you input matters.
    >
    >If by "normaliser" he means an already implemented normalizer
    >process and by "new UCD data" he means text data corresponding
    >to the new version of Unicode, then the behavior of the
    >normalizer should not change.
    >
    >

    What I mean is a program which makes a proper separation between program
    and data, which implements the Unicode normalisation *algorithm* (for a
    particular version of Unicode) but uses the Unicode character *data*, as
    well as the text data to be normalised, as part of its input. I don't
    know of any normalisation program which works in this way, and in this
    case efficiency may override good programming practice - although it
    should be possible to compile the UCD normalisation data in a way which
    can be used efficiently. But I do know of other programs which
    effectively update themselves automatically with the latest version of
    the UCD.

    Of course if the algorithm is changed from one version of Unicode to
    another, as it was when NormalizationCorrections.txt was added to the
    standard, then the program needs to be updated, and the results of using
    the new UCD data with the old algorithm are unlikely to be correct. But
    from 4.0.0 to 4.1.0 there has not, I think, been an advertised change to
    the algorithm, and so people might expect the normalisation program to
    continue to work. I agree that they should test it before use with a new
    version of Unicode, but I don't believe that all programmers are as
    careful as Doug and Jill in such matters.

    There is a particular danger with the new fashion of programs
    automatically updating themselves over the Internet - and sometimes
    breaking themselves in the process, as I have discovered to my cost.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    -- 
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.308 / Virus Database: 266.9.1 - Release Date: 01/04/2005
    


    This archive was generated by hypermail 2.1.5 : Tue Apr 05 2005 - 03:40:12 CST