Re: Representing Unix filenames in Unicode

From: Hans Aberg (
Date: Sun Nov 27 2005 - 23:28:36 CST

  • Next message: Doug Ewell: "Re: Representing Unix filenames in Unicode"

    On 28 Nov 2005, at 03:39, Christopher JS Vance wrote:

    > UTF-8, created as FSS-UTF, was invented specifically to enable its use
    > for Unix/POSIX and similar filenames.


    > The problem is people trying to create filenames which aren't UTF-8.
    > Provided you use the same character set for all filenames, the problem
    > was solved before the Unicode/10646 merger (see Plan 9 from Bell
    > Labs).

    Right, on a high level, on the human interface level. But the problem
    is that the same character set is not going to be used for all
    filenames, especially when you mix filesystems. One cannot even be
    sure that the Unicode/10646 set will be a final character set. On a
    low level, the computer to computer interface level, almost all
    filesystems do not interpret the byte strings used as filenames (only
    one exception was quoted on the UNIX/POSIX list), and there is no
    obvious benefit of doing so. For example, the case insensitive Mac OS
    HFS filesystem stores the filenames as is, and emulates the case
    insensitivity by interface functions addressing it. By overriding
    those functions, case sensitivity can implemented, and in addition,
    Apple now has a case sensitive version. Most facts points to that the
    Unicode/10646 is a human interface, not a computer to computer to
    computer interface. But it is not impossible to do it otherwise, just
    question of what might be most efficient.

       Hans Aberg

    This archive was generated by hypermail 2.1.5 : Sun Nov 27 2005 - 23:37:30 CST