Re: Representing Unix filenames in Unicode

From: Christopher JS Vance (
Date: Sun Nov 27 2005 - 20:39:01 CST

  • Next message: Doug Ewell: "Re: Representing Unix filenames in Unicode"

    On Sun, Nov 27, 2005 at 06:45:23PM +0100, Hans Aberg wrote:
    >This problem has recently been discussed in the POSIX/UNIX
    >standardization list (Austin Group List,
    >austin/). It should really be best resolved there, because one needs
    >to find an efficient solution for a UTF-8 enabled UNIX OS, and in
    >doing that, one has to take things into account such as how to
    >implement efficient files systems. One possible approach might be to
    >ensure any byte string can be represented on the filesystems level,
    >with suitable UTF-8 encodings for use in text strings (and the
    >property that they can be lifted back to the original byte strings),
    >which may vary from context to context. This approach would be
    >motivated by the fact that almost all filesystems already work this
    >way, and that it would be inefficient to burden it with character
    >interpretation schemes. But some filesystems, though rare it seems,
    >use a different approach. And when fiddling around with this, one
    >needs to assess its effect on the total UNIX OS, probably making some
    >implementations first. In the meantime, I figure you can invent the
    >encoding schemes that best fits your needs.

    UTF-8, created as FSS-UTF, was invented specifically to enable its use
    for Unix/POSIX and similar filenames.

    The problem is people trying to create filenames which aren't UTF-8.
    Provided you use the same character set for all filenames, the problem
    was solved before the Unicode/10646 merger (see Plan 9 from Bell Labs).

    Christopher Vance

    This archive was generated by hypermail 2.1.5 : Sun Nov 27 2005 - 20:40:17 CST