Re: Roundtripping in Unicode

From: Marcin 'Qrczak' Kowalczyk (
Date: Sat Dec 11 2004 - 11:47:10 CST

  • Next message: Doug Ewell: "Re: US-ASCII (was: Re: Invalid UTF-8 sequences)"

    Lars Kristan <> writes:

    > All assigned codepoints do roundtrip even in my concept.
    > But unassigned codepoints are not valid data.

    Please make up your mind: either they are valid and programs are
    required to accept them, or they are invalid and programs are required
    to reject them.

    > Furthermore, I was proposing this concept to be used, but not
    > unconditionally. So, you can, possibly even should, keep using
    > whatever you are using.

    So you prefer to make programs misbehave in unpredictable ways
    (when they pass the data from a component which uses relaxed rules
    to a component which uses strict rules) rather than have a clear and
    unambiguous notion of a valid UTF-8?

    > Perhaps I can convert mine, but I cannot convert all filenames on
    > a user's system.

    They you can't access his files.

    With your proposal you couldn't as well, because you don't make them
    valid unconditionally. Some programs would access them and some would
    break, and it's not clear what should be fixed: programs or filenames.

       __("<         Marcin Kowalczyk

    This archive was generated by hypermail 2.1.5 : Sat Dec 11 2004 - 11:51:35 CST