Handling irregular sequences

From: David E. Hollingsworth (deh@fastanimals.com)
Date: Fri Oct 05 2001 - 14:23:58 EDT

Previous message: Markus Scherer: "Re: plane business"
Next in thread: Bernard Miller: "RE: Handling irregular sequences"
Reply: Bernard Miller: "RE: Handling irregular sequences"
Reply: Misha.Wolf@reuters.com: "RE: Handling irregular sequences"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

The definition of UTF-32 (and the modifications to UTF-8 for Unicode
3.1) make it clear that conformant processes shall not generate
irregular sequences. However, they do not (and perhaps they
shouldn't) indicate what a process should do when encountering an
irregular sequence, and I'm curious what people are doing in practice.

One could apply the traditional Internet aphorism of being liberal in
what one accepts, but that didn't pan out so well for
non-shortest-form UTF-8, so in addition to wondering what people are
doing in practice, I'm also curious about the follow theoretical
issue:

It doesn't seem very likely to me that someone would write a security
check that depends on, say, passing Deseret code points but blocking
musical notation code points; however, I wouldn't say it's impossible;
moreover, a security check that wants to disallow all non-BMP
characters doesn't seem quite so outlandish. If someone did write
such a check, it seems to me that the attack described in UAX #27
would apply, by substituting "irregular sequence" for "non-shortest
form":

Process A performs security checks, but does not check for irregular
sequences.

Process B accepts the byte sequence from process A, and transforms
it into UTF-16 while interpreting irregular sequences.

The UTF-16 text may then contain characters that should have been
filtered out by process A.

Even if I'm mistaken about this, is there a specific argument *for*
accepting irregular sequences?

--deh!

Previous message: Markus Scherer: "Re: plane business"
Next in thread: Bernard Miller: "RE: Handling irregular sequences"
Reply: Bernard Miller: "RE: Handling irregular sequences"
Reply: Misha.Wolf@reuters.com: "RE: Handling irregular sequences"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Oct 05 2001 - 12:41:18 EDT