RE: UTF-8 vs. Non-UTF-8 Locales and File Names (WAS: Re: Roundtri pping in Unicode)

From: Lars Kristan (lars.kristan@hermes.si)
Date: Wed Dec 15 2004 - 05:38:30 CST

Next message: D. Starner: "RE: Roundtripping in Unicode"

Previous message: Arcane Jill: "Roundtripping Solved"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Edward H. Trager wrote:
> UTF-8's home directory). So both users could probably guess
> the filename
> they were looking at.
Which, BTW, is true for most of Europe but is not true for some other
combinations of locales.

>
> d??claration_des_droits.utf8
>
> The terminal, being set to interpret the legacy locale, does not know
> how to interpret the two bytes that are used for the UTF-8 "é".

This is well known but is only the start of what the thread was discussing.

Your example only shows a difference in interpretation. You are still able
to copy and paste the filename, use it in scripts and open in it in any
program.

Now switch your locale to Latin 1 and create a file with that name in Latin
1. Switch back to UTF-8 and try doing various things with this file. I
assume the following happens:

1 - Instead of letters being misinterpreted, they are lost. Leading to empty
filenames in extreme cases.
2 - You cannot open the file by copying its name from the terminal.
3 - You can probably still specify it in scripts (which need to be edited in
Latin 1), but if someone would start validating the script when in UTF-8
locale, you would lose that ability.
4 - Most C programs should be able to process the file. But I would not bet
on some more 'advanced' languages. The more they comply with Unicode, the
less likely it is they will open the file.
5 - Windows is likely having problems accessing that file.

And, yes, the solution is still to convert all filenames to UTF-8. That is,
if all users on a particular system agree that this is what should be done
with their files. But does not prevent such files from being generated,
whatever the reason or cause is.

Lars

Next message: D. Starner: "RE: Roundtripping in Unicode"
Previous message: Arcane Jill: "Roundtripping Solved"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 05:47:44 CST