RE: Roundtripping in Unicode

From: Lars Kristan (lars.kristan@hermes.si)
Date: Mon Dec 13 2004 - 07:00:18 CST

Next message: John Cowan: "Re: Nicest UTF"

Previous message: Lars Kristan: "RE: Nicest UTF"
Maybe in reply to: Lars Kristan: "RE: Roundtripping in Unicode"
Next in thread: Lars Kristan: "RE: Roundtripping in Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy wrote:
> From: "Doug Ewell" <dewell@adelphia.net>
> > Lars Kristan wrote:
> >> I am sure one of the standardizers will find a Unicodally
> >> correct way of putting it.
> >
> > I can't even understand that paragraph, let alone paraphrase it.
>
> My understanding of his question and my reponse to his
> problem is that you
> MUST not use VALID Unicode codepoints to represent INVALID
> byte sequences
> found in some text with alleged UTF encoding.
OK, should the codepoints for this purpose be valid or not. If the modified
conversion would be made standard and would replace the current UTF-16/32 to
UTF-8 conversion, then they would have a status of close to that of
surrogates. But not entirely. They could be considered invalid for
applications that absolutely need the bijectivity. Which is not always the
case. So, actually, many applications could and should consider them valid.
And that also means that for current applications nothing changes, since
they already consider them valid.

What I was talking about in the paragraph in question is what happens if you
want to take unassigned codepoints and give them a new status. And this is
precisely what happened with surrogates. We can discuss how things should be
called in this context, what is valid at which point and what are the
consequences. But please note that I have abandoned this idea and am now
pursuing a slightly different approach.

Lars

Next message: John Cowan: "Re: Nicest UTF"
Previous message: Lars Kristan: "RE: Nicest UTF"
Maybe in reply to: Lars Kristan: "RE: Roundtripping in Unicode"
Next in thread: Lars Kristan: "RE: Roundtripping in Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Dec 13 2004 - 07:05:14 CST