Re: Representing Unix filenames in Unicode

From: Marcin 'Qrczak' Kowalczyk (
Date: Sun Nov 27 2005 - 16:45:17 CST

  • Next message: Javier SOLA: "Re: ZWNJ in IDN (Burmese Issues)"

    "Philippe Verdy" <> writes:

    > Java already has a method to query the effective (canonical) filename
    > that is used on the filesystem after creation. So applications should
    > use it (if not, it's a application bug, not a Java API design bug).

    Which method? But it would not help anyway. I believe Java doesn't
    specify how strings are mapped to filenames and vice versa, so
    implementations are on their own. And that existing implementations
    don't allow to access arbitrary files on Unix in UTF-8 locales
    (although Sun's Java allows it in system encodings where every byte
    sequence is decodable).

    > Filesystems that currently allow storing random byte strings are
    > bogous and should be corrected (the historic UFS filesystem for Unix
    > needs a fix, at least in its associated filesystem tools like "fsck").

    Doesn't matter: I want the most usable solution which works on current
    systems, whether we like the design of these systems or not. I will
    not try to change now how Unix represents filenames. I can only
    influence how the runtime of the language exposes them to the program.

    > There's aboslutely no reason for applications running on the same
    > system to use multiple encodings that the OS can't know.

    There are very good reasons: if the application makes a backup,
    it should handle *all* files the OS is willing to let it access,
    no matter how ugly their names seem to us. The fact that some
    filenames are not decodable using the default locale encoding
    is a poor excuse. And it will occasionally do happen.

    > If there must exist several encodings depending on the user's
    > locale, then the user's locale setting must be accessible to the OS
    > itself (so the locale system must become part of it, part of its
    > kernel services, instead of being outside in a application library).

    Doesn't matter whether it should, because it isn't. I'm not designing
    an operating system now. I'm designing a language which runs on
    existing OSes and must play by their rules.

       __("<         Marcin Kowalczyk

    This archive was generated by hypermail 2.1.5 : Sun Nov 27 2005 - 16:47:21 CST