RE: Roundtripping in Unicode

From: Arcane Jill (arcanejill@ramonsky.com)
Date: Wed Dec 15 2004 - 04:27:07 CST

  • Next message: Arcane Jill: "Roundtripping Solved"

    -----Original Message-----
    From: unicode-bounce@unicode.org On Behalf Of Philippe Verdy
    Sent: 14 December 2004 22:47
    To: Marcin 'Qrczak' Kowalczyk
    Cc: unicode@unicode.org
    Subject: Re: Roundtripping in Unicode

    >From: "Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl>
    >> "Arcane Jill" <arcanejill@ramonsky.com> writes:
    >>> If so, Marcin, what exactly is the error, and whose fault is it?
    >>
    >> It's an error to use locales with different encodings on the same
    >> system.

    I confess I don't know much about Unix, but still, I'm not sure your
    assertion (Marcin) makes sense. Unix is a multi-user system. If you log on
    as User A, then User B's settings are hidden from you, unless User B has
    explicitly decided to share them. It may even be possible that there may be
    users of whose existence you are not even aware. Unix makes is possible for
    /you/ to change /your/ locale - but by your reasoning, this is an error,
    unless all other users do so simultaneously. Your reasoning implies that no
    Unix user should ever change their locale unless they have an absolute
    guarantee that all other users are going to do so simultaneously ... but I
    don't know if you can ever get such a guarantee. Or maybe you're saying that
    the error lies with Unix itself. Maybe that's fair comment, but I gather
    Unix was invented before Unicode, so it can hardly be blamed for breaking
    Unicode's conceptual model.

    But it goes beyond that. Copy a file onto a floppy disc and then physically
    take that floppy disc to a different Unix machine and log on as "guest" and
    insert the disc ... Will the filename look the same? It would seem that "the
    same system", is effectively every Unix machine on the planet, since files
    may be interchanged between them.

    The obvious solution is for all Unix machines everywhere to be using the
    same locale - and it had better be UTF-8. But an instantaneous global
    switch-over is never going to happen, so we see this gradual switch-over ...
    and it is during this transition phase that Lars's problem manifests.

    Phillipe adds...
    >More simply, I think that it's an error to have the encoding part of any
    >locale...

    which again attaches blame to Unix itself. All very "not my problem", but I
    think Lars has found that it actually /is/ his problem. (Not that I support
    his solution).

    >The system should not depend on them, and for critical things like
    >filesystem volumes, the encoding should be forced by the filesystem itself,
    >and applications should mandatorily follow the filesystem rules.

    Of course, you are suggesting not /really/ suggesting that the Unix kernel
    be rewritten. But it's hard to for me to see how else this could be
    achieved.

    >Now think about the web itself: it's really a filesystem, with billions
    >users, or trillion applications using simultaneously hundreds or thousands
    >of incompatible encodings... Many resources on the web seem to have valid
    >URLs for some users but not for others, until URLs are made independant to
    >any user locale, and then not considered as encoded plain-text but only as
    >strings of bytes.

    Oh yeah - and that too. Well spotted.
    Jill



    This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 04:33:38 CST