Re: Roundtripping Solved

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Wed Dec 15 2004 - 11:44:25 CST

  • Next message: Mike Ayers: "RE: Roundtripping in Unicode"

    Peter Kirk <peterkirk@qaya.org> writes:

    > Jill, again your solution is ingenious. But would it not work just
    > as well to for Lars' purposes to use, instead of your string of
    > random characters, just ONE reserved code point followed by U+0xx?
    > Instead of asking the UTC to allocate a specific code point for this
    > (which it probably will not do), he can use either U+FFFE or U+FFFF,
    > which "are intended for process internal uses, but are not permitted
    > for interchange." Let's call the one non-character chosen INVALID.

    Perhaps what is needed is a shift of viewpoint, not a big technical
    change.

    Don't call it a UTF. Call it escaping. Don't reserve 128 code points.
    Use an existing but rare code point to prefix a byte escaped among
    code points, and escape the escape if it's found in the original.
    Perhaps the character could be ESC (27) or SUB (26), followed by
    U+00nn.

    Well, a viewpoint shift doesn't solve all problems: it's still
    dangerous for interoperability. If the programmer doesn't do anything
    special when writing filenames to a file, then instead of an error
    which indicates that the goal doesn't have a natural solution he gets
    an escaped string which will not be understood by other applications
    wich don't use this convention. If the filename is passed to a part
    of the program which doesn't use this convention, then it will break
    too. If something cannot be done reliably, it's better to signal the
    problem immediately than to hide it and misbehave later.

    -- 
       __("<         Marcin Kowalczyk
       \__/       qrczak@knm.org.pl
        ^^     http://qrnik.knm.org.pl/~qrczak/
    


    This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 11:54:12 CST