Re: Roundtripping in Unicode

From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Dec 14 2004 - 13:32:02 CST

  • Next message: Marcin 'Qrczak' Kowalczyk: "Re: Roundtripping in Unicode"

    On 14/12/2004 17:47, John Cowan wrote:

    >Peter Kirk scripsit:
    >
    >
    >
    >>I think the problem here is that a Unix filename is a string of octets,
    >>not of characters. And so it should not be converted into another
    >>encoding form as if it is characters; it should be processed at a quite
    >>different level of interpretation.
    >>
    >>
    >
    >Unfortunately, that is simply a counsel of perfection.
    >
    >Unix filenames are in general input as character strings, output as character
    >strings, and intended to be perceived as character strings. The corner
    >cases in which this does not work are not sufficient to overthrow the
    >power and generality to be achieved by assuming it 99% of the time.
    >
    >

    This is a design flaw in Unix, or in how it is explained to users. Well,
    Lars wrote "Basically, you are not supposed to use strcpy to process
    filenames." I'm not sure if that is his opinion or someone else's, but
    the only safe way out of this mess is never to process filenames as strings.

    >(A private correspondent has come up with an ingenious trick which
    >depends on being able to create files named 0x08 and 0x7F, but it
    >truly is a trick, and in any case depends only on an ASCII interpretation.)
    >
    >
    >
    This may be called a "trick" but it looks like it could very easily be a
    security hole. For example, a filename 0x41 0x08 0x42 will be displayed
    the same as just 0x42, in a Latin-1 or UTF-8 locale. Your friend's trick
    has become an open door for spoofers.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Tue Dec 14 2004 - 13:59:45 CST