RE: Roundtripping in Unicode

From: Lars Kristan (
Date: Tue Dec 14 2004 - 09:12:02 CST

  • Next message: Edward H. Trager: "UTF-8 vs. Non-UTF-8 Locales and File Names (WAS: Re: Roundtripping in Unicode)"

    Peter Kirk wrote:

    > Now no doubt many Unix filename handling utilities ignore the
    > fact that
    > some octets are invalid or uninterpretable in the locale,
    > because they
    > handle filenames as octet strings (with 0x00 and 0x2F having special
    > interpretations) rather than as locale-dependent character
    > strings. But
    > these routines should continue to work in a UTF-8 locale, as
    > they make
    > no attempt to interpret any octets other than 0x00 and 0x2F.

    Hmmmmm, here lies the catch. According to UTC, you need to keep processing
    the UNIX filenames as BINARY data. And, also according to UTC, any UTF-8
    function is allowed to reject invalid sequences. Basically, you are not
    supposed to use strcpy to process filenames.

    Well, I just hope noone will listen to them and modify strcpy and strchr to
    validate the data when running in UTF-8 locale and start signalling
    something (really, where and how?!). The two statements from UTC don't make
    sense when put together. Unless we are really expected to start building
    everything from scratch.

    > All of this is ingenious, and may be useful for internal processing
    > within a Unix system, and perhaps even for interaction between
    > cooperating systems. But NOT-Unicode is not Unicode (!) and
    > so Unicode
    > should not be expected to standardise it.
    Not by definition. But if it would help the users since it would simplify
    the transition, then why not?


    This archive was generated by hypermail 2.1.5 : Tue Dec 14 2004 - 09:15:30 CST