Re: Roundtripping Solved

From: Peter Kirk (
Date: Fri Dec 17 2004 - 05:43:58 CST

    On 17/12/2004 10:13, Arcane Jill wrote:

    > ...
    > One last question - why /can't/ locale conversion be automated? I
    > don't really get this one, but it's the root of this whole topic.
    > Surely, if we make the following assumptions:
    > (1) No user has a locale of UTF-8, and
    > (2) Some users will have created UTF-8 filenames and UTF-8 text files,
    > and
    > (3) Some of those text files may have been concatenated, leading to
    > mixed-encoding text files
    > then we can surely automate everything. (Requirement (1) can be met
    > simply by asking all users who have changed their locale to UTF-8 to
    > change it back again, temporarily). ...

    This locale change is not exactly simple for (future?) users who only
    speak and use a language which is supported only by UTF-8 - which would
    include most Indians and SE Asians for a start.

    > Assuming these requirements, all you have to do is:
    > ...
    > # if (the file can be positively identified as a text file)
    > # {
    > # re-encode all non-UTF-8 substrings (assuming them to
    > be in the user's locale) to UTF-8

    This assumption is invalid. I have on my system a number of files which
    are text files but encoded neither in UTF-8 nor in my own locale. I read
    them either with programs which can display them according to their
    locale (which is not encoded within the file) or by using substitution
    fonts (which is justified because many of them were written for in such
    obsolescent setups). This kind of automated conversion would cause
    disastrous damage.

    Peter Kirk (personal) (work)

