L2/00-370 +ACI-Hart, Edwin F.+ACI- +ADw-Edwin.Hart+AEA-jhuapl.edu+AD4- on 10/23/2000 07:31:38 AM To: +ACI-Unicode List+ACI- +ADw-unicode+AEA-unicode.org+AD4- Subject: UTC action on malformed/illegal UTF-8 sequences? Does the UTC need to address the issue of malformed and illegal UTF-8 sequences, etc.? The text in question is the example in D32 and the last sentence of the section on shortest encoding. Background The Unicode philosophy has been to avoid killing characters your software doesn't understand. This enables adding new characters to the code without killing the software that was written before the new characters were added. The Security philosophy seems to be: If it is out of specification, kill it (+ACI-Anything not explicitly allowed is denied.+ACI-). The situation in the attached message is not the same as in the first paragraph. RFC 2279 (UTF-8) lists some examples that could cause security problems. Section 3.8 of The Unicode Standard, Version 3.0 seems to permit interpretation of +ACI-ill-formed code value sequences+ACI- that can cause other software to mis-interpret the characters and produce the wrong action. The issue for UTC may be: If a process receives an +ACI-ill-formed+ACI- code sequence, should the standard specify the action or allow interpretation and give warnings (like RFC 2279). Will more software break if the ill-formed sequence is allowed or denied? Given the number of security problems and fixes I see a week, I personally think that the UTC needs to tighten the algorithms and require an exception condition rather than interpret the ill-formed code value sequences Ed Edwin F. Hart edwin.hart+AEA-jhuapl.edu The Johns Hopkins University Applied Physics Laboratory 11100 Johns Hopkins Road Laurel, MD 20723-6099 USA 443-778-6926 (Baltimore area) 240-228-6926 (Washington, DC area) 443-778-1093 (fax) 240-228-1093 (fax) -----Original Message----- From: Cris Bailiff +AFs-mailto:c.bailiff+AEA-E-SECURE.COM.AU+AF0- Sent: Thursday, October 19, 2000 6:08 AM To: BUGTRAQ+AEA-SECURITYFOCUS.COM Subject: Re: IIS +ACU-c1+ACU-1c remote command execution +AD4- Florian Weimer +ADw-Florian.Weimer+AEA-RUS.UNI-STUTTGART.DE+AD4- writes: +AD4- +AD4- This is one of the vulnerabilities Bruce Schneier warned of in one of +AD4- the past CRYPTO-GRAM isssues. The problem isn't the wrong time of +AD4- path checking alone, but as well a poorly implemented UTF-8 decoder. +AD4- RFC 2279 explicitly says that overlong sequences such as 0xC0 0xAF are +AD4- invalid. As someone often involved in reviewing and improving other peoples web code, I have been citing the unicode security example from RFC2279 as one good reason why web programmers must enforce 'anything not explicitly is allowed is denied' almost since it was written. In commercial situations I have argued myself blue in the face that the equivalent of (perl speak) s+ACE-../+ACEAIQ-g is not good enough to clean up filename form input parameters or other pathnames (in perl, ASP, PHP etc.). I always end up being proved right, but it takes a lot of effort. Should prove a bit easier from now on :-( +AD4- +AD4- It's a pity that a lot of UTF-8 decoders in free software fail such +AD4- tests as well, either by design or careless implementation. The warning in RFC 2279 hasn't been heeded by a single unicode decoder that I have ever tested, commercial or free, including the Solaris 2.6 system libraries, the Linux unicode+AF8-console driver, Netscape commuicator and now, obviously, IIS. Its unclear to me whether the IIS/NT unicode decoding is performed by a system wide library or if its custom to IIS - either way, it can potentially affect almost any unicode aware NT application. I have resisted upgrading various cgi and mod+AF8-perl based systems to perl5.6 because it has inbuilt (default?) unicode support, and I've no idea which applications or perl libraries might be affected. The problem is even harder than it looks - which sub-system, out of the http server, the perl (or ASP or PHP...) runtime, the standard C libraries and the kernel/OS can I expect to be performing the conversion? Which one will get it right? I think Bruce wildly understated the problem, and I've no idea how to put the brakes on the crash dive into a character encoding standard which seems to have no defined canonical encoding and no obvious way of performing deterministic comparisons. I suppose as a security professional I should be happy, looking forward to a booming business... Cris Bailiff c.bailiff+AEA-e-secure.com.au 3