RE: Roundtripping in Unicode

From: Lars Kristan (
Date: Wed Dec 15 2004 - 08:50:27 CST

  • Next message: Lars Kristan: "RE: Roundtripping in Unicode"

    Arcane Jill wrote:
    > The obvious solution is for all Unix machines everywhere to
    > be using the
    > same locale - and it had better be UTF-8. But an instantaneous global
    > switch-over is never going to happen, so we see this gradual
    > switch-over ...
    > and it is during this transition phase that Lars's problem manifests.
    Yes, some may not experience it, some will experience it for a day, some for
    a month, some for a year, some indefinitely.
    And unless filesystems prevent invalid sequences to be added, it will keep
    happening to everybody. And if very seldom, then it will be even harder to
    find a person who can fix it.

    > Of course, you are suggesting not /really/ suggesting that
    > the Unix kernel
    > be rewritten. But it's hard to for me to see how else this could be
    > achieved.

    What one might pursue is to make the UNIX filesystem invariant, so
    Windows-like. In that scenario, a filesystem stores Unicode strings and
    adjusts the representation of filenames according to user's locale. But
    there are two reasons against it:

    A - If only the filesystem does it, then whenever you switch the locale, all
    references to files in other files break. Unless you treat the files in the
    same manner, which is what Windows does if an application is not Unicode
    (with a number of associated problems on top). But that is not what is
    supposed to be done on UNIX.

    B - As we move to UTF-8, there will be less and less need to use different
    locales. So why bother with enabling the system to represent UTF-8 in any
    other locale if that locale will not even be used anymore. Concerns with the
    transition period do apply, but then you end up with two transitions, which
    is even less appealing.

    So, the only percievable option is to start thinking about validation in the
    filesystem. If and when one choses to enable it. But keep in mind that it
    will only reduce the problem. Not all programs will be able to rely on it
    (like virus scanners, HSM, backup, ...).


    This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 08:58:20 CST