From: Peter Kirk (firstname.lastname@example.org)
Date: Wed Dec 15 2004 - 05:51:51 CST
On 15/12/2004 00:22, Mike Ayers wrote:
> > From: Peter Kirk [mailto:email@example.com]
> > Sent: Tuesday, December 14, 2004 3:37 PM
> > Thanks for the clarification. Perhaps the bifurcation could
> > be better expressed as into "strings of characters as defined
> > by the locale" and "strings of non-null octets". Then I could
> > re-express this as "the only safe way out of this mess is
> > never to process filenames as strings of characters as
> > defined by the locale".
> That would not be correct for ISO 8859 locales, though
> (amongst others). That's why I specified UTF-8. Although other
> locales may have the problem of invalid sequences, we're only
> interested in UTF-8 here.
But surely octets 0x80 to 0x9f are (at least mostly) invalid in ISO
8859? While some applications may choose to process these invalid
characters as if they were valid, but display them as boxes or not at
all (and this is a security risk), others and especially those concerned
with security do in fact treat them as errors, in one way or another.
For example, Marcin noted for Mozilla:
>If a filename ... can be
>converted but contains characters like 0x80-0x9F in ISO-8859-2,
>they are displayed as question marks and the file is inaccessible.
It should be treated as a general issue with ALL locales and character
sets (with perhaps just a few exceptions) that not all sequences of
octets represent valid character strings. UTF-8 is by no means a special
-- Peter Kirk firstname.lastname@example.org (personal) email@example.com (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 06:05:40 CST