RE: Handling irregular sequences

From: Bernard Miller (forunicode@yahoo.com)
Date: Sun Oct 28 2001 - 07:52:37 EST


The question raised earlier by David Hollingsworth did
not seem to get any responses from this list. I've
pasted the text of the email below. I would also like
clarification on why the utf-8 in unicode 3.1 only
forbids conformant implementations from interpreting
nonshortest forms for BMP characters --and does not
forbid interpretation of all irregular sequences for
all characters.

___
Date: 5 Oct 2001 18:23:58 -0000
From: "David E. Hollingsworth" <deh@fastanimals.com> |
Block Address | Add to Address Book
To: unicode@unicode.org
Subject: Handling irregular sequences

The definition of UTF-32 (and the modifications to
UTF-8 for Unicode
3.1) make it clear that conformant processes shall not
generate
irregular sequences. However, they do not (and
perhaps they
shouldn't) indicate what a process should do when
encountering an
irregular sequence, and I'm curious what people are
doing in practice.

One could apply the traditional Internet aphorism of
being liberal in
what one accepts, but that didn't pan out so well for
non-shortest-form UTF-8, so in addition to wondering
what people are
doing in practice, I'm also curious about the follow
theoretical
issue:

It doesn't seem very likely to me that someone would
write a security
check that depends on, say, passing Deseret code
points but blocking
musical notation code points; however, I wouldn't say
it's impossible;
moreover, a security check that wants to disallow all
non-BMP
characters doesn't seem quite so outlandish. If
someone did write
such a check, it seems to me that the attack described
in UAX #27
would apply, by substituting "irregular sequence" for
"non-shortest
form":

  Process A performs security checks, but does not
check for irregular
  sequences.

  Process B accepts the byte sequence from process A,
and transforms
  it into UTF-16 while interpreting irregular
sequences.

  The UTF-16 text may then contain characters that
should have been
  filtered out by process A.

Even if I'm mistaken about this, is there a specific
argument *for*
accepting irregular sequences?

  --deh!

___

Bernard

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com



This archive was generated by hypermail 2.1.2 : Sun Oct 28 2001 - 08:42:08 EST