From: John Cowan (jcowan@reutershealth.com)
Date: Mon Dec 06 2004 - 14:52:31 CST
Doug Ewell scripsit:
> > Now suppose you have a UNIX filesystem, containing filenames in a
> > legacy encoding (possibly even more than one). If one wants to switch
> > to UTF-8 filenames, what is one supposed to do? Convert all filenames
> > to UTF-8?
>
> Well, yes. Doesn't the file system dictate what encoding it uses for
> file names? How would it interpret file names with "unknown" characters
> from a legacy encoding? How would they be handled in a directory
> search?
Windows filesystems do know what encoding they use. But a filename on
a Unix(oid) file system is a mere sequence of octets, of which only 00
and 2F are interpreted. (Filenames containing 20, and especially 0A,
are annoying to handle with standard tools, but not illegal.)
How these octet sequences are translated to characters, if at all,
is no concern of the file system's. Some higher-level tools, such as
directory listers and shells, have hardwired assumptions, others have
changeable assumptions, but all are assumptions.
-- John Cowan jcowan@reutershealth.com www.reutershealth.com www.ccil.org/~cowan No man is an island, entire of itself; every man is a piece of the continent, a part of the main. If a clod be washed away by the sea, Europe is the less, as well as if a promontory were, as well as if a manor of thy friends or of thine own were: any man's death diminishes me, because I am involved in mankind, and therefore never send to know for whom the bell tolls; it tolls for thee. --John Donne
This archive was generated by hypermail 2.1.5 : Mon Dec 06 2004 - 14:53:38 CST