RE: Roundtripping in Unicode

From: Lars Kristan (lars.kristan@hermes.si)
Date: Wed Dec 15 2004 - 05:58:49 CST

Next message: Marcin 'Qrczak' Kowalczyk: "Re: Roundtripping Solved"

Previous message: Peter Kirk: "Re: Roundtripping in Unicode"
Maybe in reply to: Lars Kristan: "RE: Roundtripping in Unicode"
Next in thread: Lars Kristan: "RE: Roundtripping in Unicode"
Maybe reply: Lars Kristan: "RE: RE: Roundtripping in Unicode"
Maybe reply: Philippe VERDY: "Re: RE: Roundtripping in Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Marcin 'Qrczak' Kowalczyk wrote:
> But it's not possible in the direction NOT-UTF-16 -> NOT-UTF-8 ->
> NOT-UTF-16, unless you define valid sequences of NOT-UTF-16 in an
> awkward way which would happen to exclude those subsequences of
> non-characters which would form a valid UTF-8 fragment.
NOT-UTF-16 -> NOT-UTF-8 -> NOT-UTF-16 was never a goal. Nor was UTF-16 ->
NOT-UTF-8 -> UTF-16, or NOT-UTF-16 -> UTF-8 -> NOT-UTF-16.

UTF-16 -> UTF-8 -> UTF-16 is preserved and that keeps the goals of UTF
intact.

The goal, BTW, is: NOT-UTF-8 -> UTF-16 -> NOT-UTF-8.

> Question: should a new programming language which uses Unicode for
> string representation allow non-characters in strings? Argument for
> allowing them: otherwise they are completely useless at all, except
> U+FFFE for BOM detection. Argument for disallowing them: they make
> UTF-n inappropriate for serialization of arbitrary strings, and thus
> non-standard extensions of UTF-n must be used for serialization.
My opinion:
It should allow them and process them usefully. Furthermore, this 'usefully'
should not be up to developers to discover. It should be researched,
described, well, in the end even standardized. IMHO, UTC should consider
leading this process, even if it does not end with anything standardized in
Unicode standard.

Validation should be completely separated from processing. IMHO.

Lars

Next message: Marcin 'Qrczak' Kowalczyk: "Re: Roundtripping Solved"
Previous message: Peter Kirk: "Re: Roundtripping in Unicode"
Maybe in reply to: Lars Kristan: "RE: Roundtripping in Unicode"
Next in thread: Lars Kristan: "RE: Roundtripping in Unicode"
Maybe reply: Lars Kristan: "RE: RE: Roundtripping in Unicode"
Maybe reply: Philippe VERDY: "Re: RE: Roundtripping in Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 06:05:55 CST