Re: Unicode and Security

From: David Starner (starner@okstate.edu)
Date: Sat Feb 02 2002 - 23:02:26 EST


On Sun, Feb 03, 2002 at 11:41:11AM +0900, Gaspar Sinai wrote:
> I had the following problems where unicode could not
> be used because of security issues. In all cases
> the signer of a document can be lured into
> believing that the wording of the document he/she
> is about to sign is different.

This seems more like a legal issue than anything else. It's not legal to
lure someone into believing that the wording of the docuement to be
signed is different. I think you're trying to apply a technical solution
to a legal problem.
 
> 1. Character Order Problem
>
> The BIDI algorithm is too complex and not reversible.
> I could create a BIDI document where only RLO LRO and
> PDF characters were used, and the WORD, JAVA and KDE
> produced different word ordering. I don't have access
> to MS platform now to reproduce this but as far as
> I can tell it was like:
>
> <RLO>text1<PDF>U+0020<RLO>text2<PDF>
>
> Because the BIDI algorithm is too complex and vague
> it can be said that these programs all displayed
> the text correctly, still differently.
>
> text1 text2
> text2 text1

If you support the RLO/PDF characters, the answer is 1txet 2txet, if I'm
reading it right. If you don't, then there's no reason to run the bidi
algoritim, and the answer is text1 text2.
 
> Whether ligature forming will actually happen or not
> is completely up to the font. If the font does have
> the ligature, it will be formed. The standard does
> not define all the compulsory ligatures.

The whole point of this is that ligatures shouldn't be something most
users have to worry about, and they shouldn't be something that changes
meaning. If I'm using Times New Roman, it should make the ff, fi, and
ffi ligatures automatically. If I switch the document to an old-style
font, it should do ct and st automatically.
 
> b) Hidden Marks
> It is possible to make a combining mark, like a
> negation mark appear in the base characters body
> making it invisible. It is nearly impossible to
> test the rendering engine for all possible
> combinations.

Sure.
 
> 3. Text Search Problem
>
> It is possible to create texts that look the same,
> but the can not be searched because even when fully
> decomposed and ordered they will be different.

I don't see a solution for this. U+0030, U+004F, U+006F, U+039F, U+041E,
U+0555, U+0A66, U+0AE6, U+0B66, U+0C66, U+OCE6, U+0E50, U+0ED0, U+1040,
U+17E0, U+2070, U+2080, U+2134, U+25CB, U+25EF, U+274D, and U+3007 are
all a closed circular shapes. But while they could be confused when used
inappropriately, they each have distinct meaning and use. If you want
text to be searchable, then encode it properly. If you don't, well,
that's your choice.

This is true in preexisting standards, too - any that include two of the
Latin, Cyrillic and Greek scripts.

I think I'm missing your perspective. To me, these are minor quirks. Why
do you see them as huge problems?

-- 
David Starner - starner@okstate.edu, dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
What we've got is a blue-light special on truth. It's the hottest thing 
with the youth. -- Information Society, "Peace and Love, Inc."



This archive was generated by hypermail 2.1.2 : Sat Feb 02 2002 - 22:20:49 EST