Re: Nicest UTF

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Sun Dec 12 2004 - 04:52:23 CST

  • Next message: Marcin 'Qrczak' Kowalczyk: "Re: Nicest UTF"

    Lars Kristan <lars.kristan@hermes.si> writes:

    > My my, you are assuming all files are in the same encoding.

    Yes. Otherwise nothing shows filenames correctly to the user.

    > And what about all the references to the files in scripts?
    > In configuration files?

    Such files rarely use non-ASCII characters. Non-ASCII characters are
    primarily used in names of documents created explicitly by the user.

    > Soft links?

    They can be fixed automatically.

    > If you want to break things, this is definitely the way to do it.

    Using non-ASCII filenames is risky to begin with. Existing tools don't
    have a good answer to what should happen with these files when the
    default encoding used by the user changes, or when a user using a
    different encoding tries to access them.

    As long as everybody uses the same encoding and files use it too,
    things work. When the assumption is false, something will break.

    >> You mean, various programs will break at various points of time,
    >> instead of working correctly from the beginning?
    >
    > So far nothing broke. Because all the programs are in UTF-8.

    This doesn't imply that they won't break. You are talking about
    filenames which are *not* UTF-8, with the locale set to UTF-8.

    Mozilla doesn't show such filenames in a directory listing. You
    may consider it a bug, but this is a fact. Producing non-UTF-8 HTML
    labeled as UTF-8 would be wrong too. There is no good solution to
    the problem of filenames encoded in different encodings.

    Handling such filenames is incompatible with using Unicode to process
    strings. You have to go back to passing arrays of bytes with ambiguous
    interpretation of non-ASCII characters, and live with inconveniences
    like displaying garbage for non-ASCII filenames and broken sorting.

    >> Mixing any two incompatible filename encodings on the same file system
    >> is a bad idea.
    >
    > As soon as you realize you cannot convert filenames to UTF-8, you
    > will see that all you can do is start adding new ones in UTF-8.
    > Or forget about Unicode.

    I'm not using a UTF-8 locale yet, because too many programs don't
    support it. I'm using ISO-8859-2. But almost all filenames are ASCII.

    -- 
       __("<         Marcin Kowalczyk
       \__/       qrczak@knm.org.pl
        ^^     http://qrnik.knm.org.pl/~qrczak/
    


    This archive was generated by hypermail 2.1.5 : Sun Dec 12 2004 - 04:54:47 CST