From: Lars Kristan (lars.kristan@hermes.si)
Date: Tue Dec 14 2004 - 09:12:02 CST
Peter Kirk wrote:
> Now no doubt many Unix filename handling utilities ignore the
> fact that
> some octets are invalid or uninterpretable in the locale,
> because they
> handle filenames as octet strings (with 0x00 and 0x2F having special
> interpretations) rather than as locale-dependent character
> strings. But
> these routines should continue to work in a UTF-8 locale, as
> they make
> no attempt to interpret any octets other than 0x00 and 0x2F.
Hmmmmm, here lies the catch. According to UTC, you need to keep processing
the UNIX filenames as BINARY data. And, also according to UTC, any UTF-8
function is allowed to reject invalid sequences. Basically, you are not
supposed to use strcpy to process filenames.
Well, I just hope noone will listen to them and modify strcpy and strchr to
validate the data when running in UTF-8 locale and start signalling
something (really, where and how?!). The two statements from UTC don't make
sense when put together. Unless we are really expected to start building
everything from scratch.
> All of this is ingenious, and may be useful for internal processing
> within a Unix system, and perhaps even for interaction between
> cooperating systems. But NOT-Unicode is not Unicode (!) and
> so Unicode
> should not be expected to standardise it.
Not by definition. But if it would help the users since it would simplify
the transition, then why not?
Lars
This archive was generated by hypermail 2.1.5 : Tue Dec 14 2004 - 09:15:30 CST