Re: Does Unicode 4.1 change NFC?

From: Peter Kirk ([email protected])
Date: Mon Apr 04 2005 - 15:27:50 CST

Next message: Kenneth Whistler: "Re: Does Unicode 4.1 change NFC?"

Previous message: [email protected]: "Re: Tamil Aytham and the role of Unicode names"
In reply to: Kenneth Whistler: "Re: Does Unicode 4.1 change NFC?"
Next in thread: Doug Ewell: "Re: Does Unicode 4.1 change NFC?"
Reply: Doug Ewell: "Re: Does Unicode 4.1 change NFC?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 04/04/2005 19:02, Kenneth Whistler wrote:

>Peter Kirk continued:
>
>
>
>>In that case these character allocations seem perverse, given that both
>>of these characters could have been assigned to the BMP, or both to
>>outside it
>>
>>
>
>Perverse it may be, but there is no point in casting implied
>asperversions at the UTC.
>
>

Well, I didn't name the UTC, but thanks for the clarification.

>...
>Crying "security hole!" seems to be the Fad Of The Month on the
>Unicode list, but this isn't one of them.
>
>In any conformant Unicode 4.0.1 (or earlier) version of normalization,
>U+FACF normalizes to (tada!) U+FACF. If it doesn't, the normalizer
>isn't conformant. If sending U+FACF to such a normalizer crashes
>an application, then shame on the programmer.
>
>

The problem will of course come when new UCD data is fed into an old
normaliser. You have made much in the past of the need not to change the
normalisation algorithm, not to add new classes of exceptions etc so
that programs don't have to be rewritten for each new version, only the
data needs to be updated. The sort of outcome I might well expect to see
from this is a normaliser emitting surrogate pairs in UTF-8 or UTF-32.

>In any conformant Unicode 4.1.0 version of normalization, U+FACF
>normalizes to U+2284A. If it doesn't, the normalizer isn't
>conformant. If sending U+FACF to such a normalizer crashes
>an application, then shame on the programmer.
>
>

Well, however much I say "shame on the programmer" who wrote programs
which allowed all those nasty viruses and worms of a couple of years ago
to spread (I don't mean the virus etc programmer, but the mail client
etc programmer), that doesn't change the fact that that they cost
various people millions of dollars.

>There is a very good set of normalization test data available for
>both Unicode 4.0.0 and now for Unicode 4.1.0. Anyone who puts
>out an implementation of normalization that cannot pass the
>appropriate version test deserves what they get.
>
>

Indeed everyone should test their programs extensively for each new
version. But will they? And if they don't, do their customers deserve
what they get?

>In neither case is this a security hole *caused* by the allocation.
>
>
>
Fair enough, but it is potentially laid open by the allocation. Programs
can be a bit like minefields, full of bugs which might blow up on you at
any time. Careful sweeping of the commonly used parts of the
multidimensional data space has cleared out the bugs which are most
likely to cause trouble. But in areas off the beaten track lurk
unexploded bugs (to mix a metaphor), ready to blow up in your face as
soon as you feed in novel kinds of data which cause the program to
follow untested paths. That is the danger here.

-- 
Peter Kirk
[email protected] (personal)
[email protected] (work)
http://www.qaya.org/
-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.9.1 - Release Date: 01/04/2005

Next message: Kenneth Whistler: "Re: Does Unicode 4.1 change NFC?"
Previous message: [email protected]: "Re: Tamil Aytham and the role of Unicode names"
In reply to: Kenneth Whistler: "Re: Does Unicode 4.1 change NFC?"
Next in thread: Doug Ewell: "Re: Does Unicode 4.1 change NFC?"
Reply: Doug Ewell: "Re: Does Unicode 4.1 change NFC?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Apr 04 2005 - 15:28:29 CST