From: Lars Kristan (lars.kristan@hermes.si)
Date: Thu Dec 16 2004 - 08:53:59 CST
Peter Kirk wrote:
> So do you mean to relax the requirement "for all valid UTF-8
> strings s8,
> f(s8) = UTF-16(s8)"? The problem with this is that it is broken by
> existing filenames which (probably by chance) form the UTF-8
> for one of
> your 128 replacement codepoints.
Those are 'escaped' themselves.
They don't translate as UTF-8 would to UTF-16, but they still do form valid
UTF-16. And they do roundtrip.
> Well, there are not 128 replacement
> codepoints, and never will be, certainly not in the BMP -
> unless you are
> talking about unpaired surrogates or the PUA.
Never say never. But then again, PUA is in BMP. If UTC wants to really make
a mess, they will assign from non-BMP. Which would be a less-than-perfect
solution. In that case I would rather see that the PUA solution is the
silent agreement.
But U+A680..U+A6FF would be perfect. If any other range is taken I risk
being assasinated by a nation that would be pushed out of BMP because of me.
A6 on the other hand is Yi extensions. No guarantee that A5 and A6 will
suffice indefinitely. So, Yi will eventually extend to other Planes anyway.
Lars
This archive was generated by hypermail 2.1.5 : Thu Dec 16 2004 - 08:58:59 CST