Does the UTC need to address the issue of malformed and illegal UTF-8
sequences, etc.? The text in question is the example in D32 and the last
sentence of the section on shortest encoding.
The Unicode philosophy has been to avoid killing characters your software
doesn't understand. This enables adding new characters to the code without
killing the software that was written before the new characters were added.
The Security philosophy seems to be: If it is out of specification, kill it
("Anything not explicitly allowed is denied.").
The situation in the attached message is not the same as in the first
RFC 2279 (UTF-8) lists some examples that could cause security problems.
Section 3.8 of The Unicode Standard, Version 3.0 seems to permit
interpretation of "ill-formed code value sequences" that can cause other
software to mis-interpret the characters and produce the wrong action.
The issue for UTC may be: If a process receives an "ill-formed" code
sequence, should the standard specify the action or allow interpretation and
give warnings (like RFC 2279). Will more software break if the ill-formed
sequence is allowed or denied? Given the number of security problems and
fixes I see a week, I personally think that the UTC needs to tighten the
algorithms and require an exception condition rather than interpret the
ill-formed code value sequences
Edwin F. Hart
The Johns Hopkins University Applied Physics Laboratory
11100 Johns Hopkins Road
Laurel, MD 20723-6099
+1-443-778-6926 (Baltimore area)
+1-240-228-6926 (Washington, DC area)
From: Cris Bailiff [mailto:c.bailiff@E-SECURE.COM.AU]
Sent: Thursday, October 19, 2000 6:08 AM
Subject: Re: IIS %c1%1c remote command execution
> Florian Weimer <Florian.Weimer@RUS.UNI-STUTTGART.DE> writes:
> This is one of the vulnerabilities Bruce Schneier warned of in one of
> the past CRYPTO-GRAM isssues. The problem isn't the wrong time of
> path checking alone, but as well a poorly implemented UTF-8 decoder.
> RFC 2279 explicitly says that overlong sequences such as 0xC0 0xAF are
As someone often involved in reviewing and improving other peoples web code,
have been citing the unicode security example from RFC2279 as one good
web programmers must enforce 'anything not explicitly is allowed is denied'
almost since it was written. In commercial situations I have argued myself
in the face that the equivalent of (perl speak) s!../!!g is not good enough
clean up filename form input parameters or other pathnames (in perl, ASP,
etc.). I always end up being proved right, but it takes a lot of effort.
prove a bit easier from now on :-(
> It's a pity that a lot of UTF-8 decoders in free software fail such
> tests as well, either by design or careless implementation.
The warning in RFC 2279 hasn't been heeded by a single unicode decoder that
have ever tested, commercial or free, including the Solaris 2.6 system
the Linux unicode_console driver, Netscape commuicator and now, obviously,
Its unclear to me whether the IIS/NT unicode decoding is performed by a
wide library or if its custom to IIS - either way, it can potentially affect
almost any unicode aware NT application.
I have resisted upgrading various cgi and mod_perl based systems to perl5.6
because it has inbuilt (default?) unicode support, and I've no idea which
applications or perl libraries might be affected. The problem is even harder
it looks - which sub-system, out of the http server, the perl (or ASP or
runtime, the standard C libraries and the kernel/OS can I expect to be
the conversion? Which one will get it right? I think Bruce wildly
problem, and I've no idea how to put the brakes on the crash dive into a
character encoding standard which seems to have no defined canonical
no obvious way of performing deterministic comparisons.
I suppose as a security professional I should be happy, looking forward to a
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT