From: Andrew C. West (email@example.com)
Date: Tue Apr 05 2005 - 05:25:20 CST
I have been listening with increasing incredulity to Peter's claims that the
Unicode standard should be constrained by *theoretical* problems resulting from
invalid assumptions on the part of bad programmers. By the same reasoning no
international standard should mandate four-digit years, as "bad programmers"
used to be in the habit of storing years as two digits only, on the mistaken
assumption that the end of the world was more likely than humankind managing to
survive to the next century.
On Tue, 05 Apr 2005 10:33:26 +0100, Peter Kirk wrote:
> What I mean is a program which makes a proper separation between program
> and data, which implements the Unicode normalisation *algorithm* (for a
> particular version of Unicode) but uses the Unicode character *data*, as
> well as the text data to be normalised, as part of its input. I don't
> know of any normalisation program which works in this way, and in this
> case efficiency may override good programming practice
As any "good" software engineer knows, bad programming practice is never
> - although it
> should be possible to compile the UCD normalisation data in a way which
> can be used efficiently. But I do know of other programs which
> effectively update themselves automatically with the latest version of
> the UCD.
> Of course if the algorithm is changed from one version of Unicode to
> another, as it was when NormalizationCorrections.txt was added to the
> standard, then the program needs to be updated, and the results of using
> the new UCD data with the old algorithm are unlikely to be correct. But
> from 4.0.0 to 4.1.0 there has not, I think, been an advertised change to
> the algorithm, and so people might expect the normalisation program to
> continue to work.
As a closet software engineer I have some experience in both writing and testing
software, and I would suggest that any normalization software that is not fully
retested when it is updated to a new version of Unicode should be avoided like
the plague. In my implementation of normalization I do not assume 16 bit
characters will be normalized to 16 bit values, and I did not expect my
implementation to be broken by the new version of Unicode, yet for the sake of
good programming practice I did fully retest against the normalization test data
... which was a good thing, as it did unexpectedly fail the first time round,
but that was due to PRI-29 -- which is an advertised change to the normalization
algorithm that I had not been aware of.
> I agree that they should test it before use with a new
> version of Unicode, but I don't believe that all programmers are as
> careful as Doug and Jill in such matters.
Then you should be careful not to buy any software from such people.
> There is a particular danger with the new fashion of programs
> automatically updating themselves over the Internet - and sometimes
> breaking themselves in the process, as I have discovered to my cost.
I know of applications that automatically update simple lists of Unicode data
over the internet (e.g. Microsoft's Keyboard Layout Creator which can update the
Unicode name list from the Unicode site), but I suspect that at present no
application that does complex Unicode processing such as normalization simply
downloads a new copy of the UCD data. In the future it is quite possible that
applications will be able to download new versions of Unicode data files, and
rewrite themselves to store the new data internally, but any such application
would need to be tested even more thoroughly than an ordinary application, and
the prorammers better be damned sure that they are not making any unwarranted
This archive was generated by hypermail 2.1.5 : Tue Apr 05 2005 - 05:27:14 CST