From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Sun Dec 12 2004 - 04:52:23 CST
Lars Kristan <lars.kristan@hermes.si> writes:
> My my, you are assuming all files are in the same encoding.
Yes. Otherwise nothing shows filenames correctly to the user.
> And what about all the references to the files in scripts?
> In configuration files?
Such files rarely use non-ASCII characters. Non-ASCII characters are
primarily used in names of documents created explicitly by the user.
> Soft links?
They can be fixed automatically.
> If you want to break things, this is definitely the way to do it.
Using non-ASCII filenames is risky to begin with. Existing tools don't
have a good answer to what should happen with these files when the
default encoding used by the user changes, or when a user using a
different encoding tries to access them.
As long as everybody uses the same encoding and files use it too,
things work. When the assumption is false, something will break.
>> You mean, various programs will break at various points of time,
>> instead of working correctly from the beginning?
>
> So far nothing broke. Because all the programs are in UTF-8.
This doesn't imply that they won't break. You are talking about
filenames which are *not* UTF-8, with the locale set to UTF-8.
Mozilla doesn't show such filenames in a directory listing. You
may consider it a bug, but this is a fact. Producing non-UTF-8 HTML
labeled as UTF-8 would be wrong too. There is no good solution to
the problem of filenames encoded in different encodings.
Handling such filenames is incompatible with using Unicode to process
strings. You have to go back to passing arrays of bytes with ambiguous
interpretation of non-ASCII characters, and live with inconveniences
like displaying garbage for non-ASCII filenames and broken sorting.
>> Mixing any two incompatible filename encodings on the same file system
>> is a bad idea.
>
> As soon as you realize you cannot convert filenames to UTF-8, you
> will see that all you can do is start adding new ones in UTF-8.
> Or forget about Unicode.
I'm not using a UTF-8 locale yet, because too many programs don't
support it. I'm using ISO-8859-2. But almost all filenames are ASCII.
-- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/
This archive was generated by hypermail 2.1.5 : Sun Dec 12 2004 - 04:54:47 CST