RE: How to distinguish UTF-8 from Latin-* ?

From: Robert A. Rosenberg (bob.rosenberg@digitscorp.com)
Date: Fri Jun 23 2000 - 14:01:03 EDT


At 09:41 AM 06/22/2000 -0800, Karlsson Kent - keka wrote:
><snip>
>"Be liberal with what you accept and conservative with what you create"]).
>
>
>Well, there is a security aspect to this: sometimes given texts
>need to be scanned to try to determine if they are "harmless"
>or may trigger some undesirable interpretation (as interpreted
>program code, like shell-script, for instance). A hacker may
>try to hide characters that trigger the undesired, and potentially
>dangerous, interpretation, by using overlong UTF-8 sequences.
>If the security scanner program does not "decode" overlong
>UTF-8 sequences,

Since the interpreter will only see it if the security system has "signed
off" on its harmlessness, there is nothing to say that the security system
can not normalize the overlong strings prior to doing its scan and act as
if that were the form they were supplied in. This would let the interpreter
accept the overlong data (or the normalized copy the security system
checked and then passed to it).

>but the interpreter accepts them as if nothing
>was wrong, things you would not like to happen might happen.
>So overlong UTF-8 sequences should be regarded as errors, and
>not as a coding for any character at all. Yes, you may regard
>systems that at all have "escapes" into "execute this" mode
>as ill-designed. But they are around.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT