Re[2]: Unicode and Security

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Feb 08 2002 - 13:24:01 EST


At 06:18 PM 2/8/02 +0100, Philipp Reichmuth wrote:
>Oh, it is very well possible to design a character set that supports
>all of Latin, Cyrillic and Greek without being susceptible to this
>problem beyond the familiar 1-l-|, 0-O dimension. The main premise is
>to encode glyphs instead of characters so that one glyph "A" is used
>in all three of these alphabets. Roundtrip compatibility with legacy
>character sets would be a problem, though. It looks like there is the
>decision between kludge A (roundtrip compatibility missing) and kludge
>B (easier spoofability).
If your statement was phrased differently, i.e. saying that domain name
registration and resolution should not allow a distinction between
A.com and A.com where one uses the Greek and one the Latin A, that
would be a different matter. Such action would close this spoofing
loophole very effectively w/o restricting the registration of
meaningful names. However, there may be subtle issues with such an
approach. But the important thing is that it does not fiddle with the
character set as such.

>However, for URLs etc., roundtrip
>compatibility is not really necessary, I think.

I beg to differ. Roundtrip convertibility is very important since URLs live
in documents encoded in Unicode, ISO/IEC 8859-7, even Shift-JIS etc. that
are all
not 'glyph' encodings. Whatever specialized 'character set' gets used
transiently in resolving the domain name is one issue, but it better be
easily possible to convert between it and the form URLs are actually stored
in hypertext.

>I am sure they can be fixed by designing a better character set that
>is better suited to a given problem. A lot of problems can be avoided
>by regarding a character set as an application-specific entity to some
>extent.
>
>This is not what we want, of course; we want a universal encoding
>across all applications. This being our premise, the resulting
>problems which you cannot possibly deny will have to be dealt with in
>one way or the other.

Nobody argues that spoofing and other security issues shouldn't get
addressed.

>To me, it seems a better idea to fix problems
>that arise directly from the way we encode our characters already on
>the character set level as far as possible, even if it just means
>notifying people that mixing characters from different alphabets may
>lead to misinterpretations and to denote common glyph similarities in
>the standard, such as the glyph "A" or for that part the character "A"
>being indiscernible in several alphabets.

And we are certainly doing that. But, while A is an important character,
there are nearly 70,000 han characters out there, some with distinctions
so subtle that many fonts will not show them and many users will not
recognize them. This has not featured in this discussion so far, nicely
showing how our perception of issues are colored by our personal experience
with scripts and languages. For han characters even my simple suggestion
above is probably not practical.

A./



This archive was generated by hypermail 2.1.2 : Fri Feb 08 2002 - 12:27:04 EST