Re: Roundtripping in Unicode

From: Peter Kirk (peterkirk@qaya.org)
Date: Wed Dec 15 2004 - 05:51:51 CST

Next message: Lars Kristan: "RE: Roundtripping in Unicode"

Previous message: D. Starner: "RE: Roundtripping in Unicode"
In reply to: Mike Ayers: "RE: Roundtripping in Unicode"
Next in thread: Mike Ayers: "RE: Roundtripping in Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 15/12/2004 00:22, Mike Ayers wrote:

>
> > From: Peter Kirk [mailto:peterkirk@qaya.org]
> > Sent: Tuesday, December 14, 2004 3:37 PM
>
> > Thanks for the clarification. Perhaps the bifurcation could
> > be better expressed as into "strings of characters as defined
> > by the locale" and "strings of non-null octets". Then I could
> > re-express this as "the only safe way out of this mess is
> > never to process filenames as strings of characters as
> > defined by the locale".
>
> That would not be correct for ISO 8859 locales, though
> (amongst others). That's why I specified UTF-8. Although other
> locales may have the problem of invalid sequences, we're only
> interested in UTF-8 here.
>

But surely octets 0x80 to 0x9f are (at least mostly) invalid in ISO
8859? While some applications may choose to process these invalid
characters as if they were valid, but display them as boxes or not at
all (and this is a security risk), others and especially those concerned
with security do in fact treat them as errors, in one way or another.
For example, Marcin noted for Mozilla:

>If a filename ... can be
>converted but contains characters like 0x80-0x9F in ISO-8859-2,
>they are displayed as question marks and the file is inaccessible.
>

It should be treated as a general issue with ALL locales and character
sets (with perhaps just a few exceptions) that not all sequences of
octets represent valid character strings. UTF-8 is by no means a special
case here.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/

Next message: Lars Kristan: "RE: Roundtripping in Unicode"
Previous message: D. Starner: "RE: Roundtripping in Unicode"
In reply to: Mike Ayers: "RE: Roundtripping in Unicode"
Next in thread: Mike Ayers: "RE: Roundtripping in Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 06:05:40 CST