Re: Fw: Unicode filename problems

From: Philippe Verdy (
Date: Sun Jun 01 2003 - 06:02:34 EDT

  • Next message: Martin Duerst: "Re: Language Tag Registrations"

    From: "Raymond Mercier" <>
    > At 00:11 01/06/2003 +0200, you wrote:
    > >but certainly not for the file index stored in a ZIP file where there's no
    > >reason why it should not contain correctly encoded and portable UTF-8 names
    > Doesn't one have to know the binary format of a Zip file to be sure of that
    > ? I suppose that is proprietry, and in any case, I don't have it.

    It's not proprietary. The format is fully documented, and implemented in a lot of zip tools, some of them being open-sourced or source-disclosed (such as the JAR tool in the Java JDK).

    The only thing that may cause problems is the use of compression algorithms (not the decompression algorithm or the compressed binary file format which are generally royaltee-free and fully documented) which must be chosen with extreme care because of licencing requirements.

    If one is no sure about the legality of the compression algorithm, it's best to use a generic library that has been scrutinized by lots of people to make it comply with licencing requirements. You can still use the decompression algorithm you want or rewrite it the way you want, using these published specifications.

    Look for example at the Microsoft's proprietary .cab format. Its binary encoding and format specification is fully disclosed by Microsoft, which means that you can create your own compressor, provided that it produces the documented format that can be read according to these specs. Microsof really protects only the compression algorithm and its implementation library by very restrictie terms.

    Even in that case, the compression and decompression algorithms or binary file format will not forbid you to use a Unicode-savvy encoding for the internal file index of your archive file.

    This archive was generated by hypermail 2.1.5 : Sun Jun 01 2003 - 13:07:33 EDT