RE: Roundtripping in Unicode

From: Lars Kristan (lars.kristan@hermes.si)
Date: Tue Dec 14 2004 - 09:12:02 CST

Next message: Edward H. Trager: "UTF-8 vs. Non-UTF-8 Locales and File Names (WAS: Re: Roundtripping in Unicode)"

Previous message: Lars Kristan: "RE: Roundtripping in Unicode"
Maybe in reply to: Lars Kristan: "RE: Roundtripping in Unicode"
Next in thread: Lars Kristan: "RE: RE: Roundtripping in Unicode"
Maybe reply: Lars Kristan: "RE: RE: Roundtripping in Unicode"
Maybe reply: Philippe VERDY: "Re: RE: Roundtripping in Unicode"
Reply: Marcin 'Qrczak' Kowalczyk: "Re: Roundtripping in Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter Kirk wrote:

> Now no doubt many Unix filename handling utilities ignore the
> fact that
> some octets are invalid or uninterpretable in the locale,
> because they
> handle filenames as octet strings (with 0x00 and 0x2F having special
> interpretations) rather than as locale-dependent character
> strings. But
> these routines should continue to work in a UTF-8 locale, as
> they make
> no attempt to interpret any octets other than 0x00 and 0x2F.

Hmmmmm, here lies the catch. According to UTC, you need to keep processing
the UNIX filenames as BINARY data. And, also according to UTC, any UTF-8
function is allowed to reject invalid sequences. Basically, you are not
supposed to use strcpy to process filenames.

Well, I just hope noone will listen to them and modify strcpy and strchr to
validate the data when running in UTF-8 locale and start signalling
something (really, where and how?!). The two statements from UTC don't make
sense when put together. Unless we are really expected to start building
everything from scratch.

> All of this is ingenious, and may be useful for internal processing
> within a Unix system, and perhaps even for interaction between
> cooperating systems. But NOT-Unicode is not Unicode (!) and
> so Unicode
> should not be expected to standardise it.
Not by definition. But if it would help the users since it would simplify
the transition, then why not?

Lars

Next message: Edward H. Trager: "UTF-8 vs. Non-UTF-8 Locales and File Names (WAS: Re: Roundtripping in Unicode)"
Previous message: Lars Kristan: "RE: Roundtripping in Unicode"
Maybe in reply to: Lars Kristan: "RE: Roundtripping in Unicode"
Next in thread: Lars Kristan: "RE: RE: Roundtripping in Unicode"
Maybe reply: Lars Kristan: "RE: RE: Roundtripping in Unicode"
Maybe reply: Philippe VERDY: "Re: RE: Roundtripping in Unicode"
Reply: Marcin 'Qrczak' Kowalczyk: "Re: Roundtripping in Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Dec 14 2004 - 09:15:30 CST