Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)

From: John Cowan (jcowan@reutershealth.com)
Date: Mon Dec 06 2004 - 14:52:31 CST

Next message: Doug Ewell: "Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)"

Previous message: John Hudson: "Re: OpenType not for Open Communication?"
In reply to: Doug Ewell: "Invalid UTF-8 sequences (was: Re: Nicest UTF)"
Next in thread: Doug Ewell: "Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)"
Reply: Doug Ewell: "Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)"
Reply: Antoine Leca: "Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Doug Ewell scripsit:

> > Now suppose you have a UNIX filesystem, containing filenames in a
> > legacy encoding (possibly even more than one). If one wants to switch
> > to UTF-8 filenames, what is one supposed to do? Convert all filenames
> > to UTF-8?
>
> Well, yes. Doesn't the file system dictate what encoding it uses for
> file names? How would it interpret file names with "unknown" characters
> from a legacy encoding? How would they be handled in a directory
> search?

Windows filesystems do know what encoding they use. But a filename on
a Unix(oid) file system is a mere sequence of octets, of which only 00
and 2F are interpreted. (Filenames containing 20, and especially 0A,
are annoying to handle with standard tools, but not illegal.)

How these octet sequences are translated to characters, if at all,
is no concern of the file system's. Some higher-level tools, such as
directory listers and shells, have hardwired assumptions, others have
changeable assumptions, but all are assumptions.

-- 
John Cowan  jcowan@reutershealth.com  www.reutershealth.com  www.ccil.org/~cowan
No man is an island, entire of itself; every man is a piece of the
continent, a part of the main.  If a clod be washed away by the sea,
Europe is the less, as well as if a promontory were, as well as if a
manor of thy friends or of thine own were: any man's death diminishes me,
because I am involved in mankind, and therefore never send to know for
whom the bell tolls; it tolls for thee.  --John Donne

Next message: Doug Ewell: "Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)"
Previous message: John Hudson: "Re: OpenType not for Open Communication?"
In reply to: Doug Ewell: "Invalid UTF-8 sequences (was: Re: Nicest UTF)"
Next in thread: Doug Ewell: "Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)"
Reply: Doug Ewell: "Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)"
Reply: Antoine Leca: "Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Dec 06 2004 - 14:53:38 CST