RE: Roundtripping Solved

From: Lars Kristan (lars.kristan@hermes.si)
Date: Thu Dec 16 2004 - 08:33:21 CST

Next message: Lars Kristan: "RE: Roundtripping Solved"

Previous message: Arcane Jill: "RE: Roundtripping Solved"
Maybe in reply to: Arcane Jill: "Roundtripping Solved"
Next in thread: Lars Kristan: "RE: Roundtripping Solved"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Arcane Jill wrote:

> They are therefore
> nothing to do with
> Unicode or the UTC (... or even this list ! ).

This is one of the excuses UTC *can* use to stay out of this mess. I am
hoping they won't do that.

But I do not agree with you. Those functions can solve several problems, by
allowing:

* Retaining the relevant bits when (during conversion to Unicode strings)
encountering an unassigned character in some SBCS or an invalid sequence in
any MBCS, including, but not limited to, UTF-8. And provide a means to
reliably reconstruct the data should the original be lost by the time the
problem is detected. As Marcin would say, it is better to prevent it in the
first place by signaling the problem when the conversion is done, but that
is not always practiced, nor is always practical.

* Temporary coexistence of UTF-8 and legacy encoded filenames on the same
filesystem, or within the same LAN. No matter how good the tools for
speeding up that process, it will take time and the number of the legacy
encoded filenames will only reduce exponentially. Making the coexistence a
pain should (in theory) make it faster, but will not make it go away. It
could however delay it.

* Reliable manipulation with filenames even if they contain invalid UTF-8
sequences. Thus reducing security risks and load on the IT departments.

* A simple way to fix any application that HAS to deal with non-validated
UTF-8 data. As opposed to declaring the data as binary and having to rewrite
existing code or, in case of fresh development, implement functions,
transports and protocols to deal with it.

All this should help Unicode (in general, and UTF-8 in UNIX filesystems in
particular) to be accepted faster and with less pain.

And that is something that definitely has something to do with both UTC and
this list.

> I'm not quite sure why Lars
> isn't happy with
> these suggestions
I already have a solution. I would be embarrassed if you would manage to
find a better one overnight :)

> - maybe his goal has still not been clearly
> stated -
To verify the solution and possibly provide the 128 codepoints. Not just for
me, but for anyone else who might find them useful.

Lars

Next message: Lars Kristan: "RE: Roundtripping Solved"
Previous message: Arcane Jill: "RE: Roundtripping Solved"
Maybe in reply to: Arcane Jill: "Roundtripping Solved"
Next in thread: Lars Kristan: "RE: Roundtripping Solved"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Dec 16 2004 - 08:40:24 CST