From: Lars Kristan (lars.kristan@hermes.si)
Date: Mon Dec 13 2004 - 07:00:18 CST
Philippe Verdy wrote:
> From: "Doug Ewell" <dewell@adelphia.net>
> > Lars Kristan wrote:
> >> I am sure one of the standardizers will find a Unicodally
> >> correct way of putting it.
> >
> > I can't even understand that paragraph, let alone paraphrase it.
>
> My understanding of his question and my reponse to his
> problem is that you
> MUST not use VALID Unicode codepoints to represent INVALID
> byte sequences
> found in some text with alleged UTF encoding.
OK, should the codepoints for this purpose be valid or not. If the modified
conversion would be made standard and would replace the current UTF-16/32 to
UTF-8 conversion, then they would have a status of close to that of
surrogates. But not entirely. They could be considered invalid for
applications that absolutely need the bijectivity. Which is not always the
case. So, actually, many applications could and should consider them valid.
And that also means that for current applications nothing changes, since
they already consider them valid.
What I was talking about in the paragraph in question is what happens if you
want to take unassigned codepoints and give them a new status. And this is
precisely what happened with surrogates. We can discuss how things should be
called in this context, what is valid at which point and what are the
consequences. But please note that I have abandoned this idea and am now
pursuing a slightly different approach.
Lars
This archive was generated by hypermail 2.1.5 : Mon Dec 13 2004 - 07:05:14 CST