Re: Representing Unix filenames in Unicode

From: Philippe Verdy (
Date: Tue Nov 29 2005 - 10:14:55 CST

  • Next message: Philippe Verdy: "Re: Representing Unix filenames in Unicode"

    From: "Antoine Leca" <>
    > On Tuesday, November 29th, 2005 07:03Z, Chris Jacobs wrote:
    >> What happens when two files have different, but canonical equivalent,
    >> file names?
    > The operating system sees two different files (without any relationship
    > one
    > with the other), and you (the user, the "human") see two files with
    > apparently the same handle to grasp them (the same name).
    > My idea is that you are going to loose, so probably thou shalt not do
    > that.

    Why that? The user interface can disambiguate the "user-friendly" name by
    displaying additional meta-data properties about any file, using for example
    the URL-encoding syntax (starting by "file:"), if the name must be used in
    secured program interfaces.

    If a name can't be correctly decoded as valid UTF-8, or if itisdifferent
    from its NFC form, or if it starts by "file:" I would suggest storing the
    filename only with its URL-encoding syntax (starting by "file:"), and simply
    avoid using any "shell escaping" mechanism (because they are not portable,
    even on the same Unix/Linux system as it depends on the capabilities of the
    Shell, and because the URL syntax is independant of the filesystem type
    actually used).

    My opinion is that the OS just needs to support the "file:" URL-encoding
    mechanism natively in all its filesystem APIs (file opening, creation,
    deletion, linking, dirent..., and all the problems caused by variable
    interpretations of binary encodings of Unix filenames are definitely gone.)
    This means that existing filenames that currently start by "file:" or by a
    URL-encoding scheme must be given to this interface with "%2A" instead of

    There's absolutely NO need to override UTF-8.

    This archive was generated by hypermail 2.1.5 : Tue Nov 29 2005 - 12:20:05 CST