RE: Roundtripping Solved

From: Lars Kristan (lars.kristan@hermes.si)
Date: Thu Dec 16 2004 - 08:53:59 CST

  • Next message: Lars Kristan: "Implementation of the roundtripping (was RE: Roundtripping in Uni code)"

    Peter Kirk wrote:
    > So do you mean to relax the requirement "for all valid UTF-8
    > strings s8,
    > f(s8) = UTF-16(s8)"? The problem with this is that it is broken by
    > existing filenames which (probably by chance) form the UTF-8
    > for one of
    > your 128 replacement codepoints.

    Those are 'escaped' themselves.

    They don't translate as UTF-8 would to UTF-16, but they still do form valid
    UTF-16. And they do roundtrip.

    > Well, there are not 128 replacement
    > codepoints, and never will be, certainly not in the BMP -
    > unless you are
    > talking about unpaired surrogates or the PUA.

    Never say never. But then again, PUA is in BMP. If UTC wants to really make
    a mess, they will assign from non-BMP. Which would be a less-than-perfect
    solution. In that case I would rather see that the PUA solution is the
    silent agreement.

    But U+A680..U+A6FF would be perfect. If any other range is taken I risk
    being assasinated by a nation that would be pushed out of BMP because of me.
    A6 on the other hand is Yi extensions. No guarantee that A5 and A6 will
    suffice indefinitely. So, Yi will eventually extend to other Planes anyway.

    Lars



    This archive was generated by hypermail 2.1.5 : Thu Dec 16 2004 - 08:58:59 CST