Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)

From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Thu Dec 09 2004 - 05:29:17 CST

  • Next message: Arcane Jill: "Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)"

    On Monday, December 6th, 2004 20:52Z John Cowan va escriure:

    > Doug Ewell scripsit:
    >
    >>> Now suppose you have a UNIX filesystem, containing filenames in a
    >>> legacy encoding (possibly even more than one). If one wants to
    >>> switch to UTF-8 filenames, what is one supposed to do? Convert all
    >>> filenames to UTF-8?
    >>
    >> Well, yes. Doesn't the file system dictate what encoding it uses for
    >> file names? How would it interpret file names with "unknown"
    >> characters from a legacy encoding? How would they be handled in a
    >> directory search?
    >
    > Windows filesystems do know what encoding they use.

    Err, not really. MS-DOS *need to know* the encoding to use, a bit like a
    *nix application that displays filenames need to know the encoding to use
    the correct set of glyphs (but constrainst are much more heavy.) Also
    Windows NT Unicode applications know it, because it can't be changed :-).

    But when it comes to other Windows applications (still the more common) that
    happen to operate in 'Ansi' mode, they are subject to the hazard of codepage
    translations. Even if Windows 'knows' the encoding used for the filesystem
    (as when it uses NTFS or Joliet, or VFAT on NT kernels; in the other cases
    it does not even know it, much like with *nix kernels), the only usable set
    is the _intersection_ of the set used to write and the set used to read;
    that is, usually, it is restricted to US ASCII, very much like the usable
    set in *nix cases...

    Antoine



    This archive was generated by hypermail 2.1.5 : Thu Dec 09 2004 - 05:36:35 CST