RE: Roundtripping in Unicode

From: Lars Kristan (
Date: Mon Dec 13 2004 - 07:00:18 CST

  • Next message: John Cowan: "Re: Nicest UTF"

    Philippe Verdy wrote:
    > From: "Doug Ewell" <>
    > > Lars Kristan wrote:
    > >> I am sure one of the standardizers will find a Unicodally
    > >> correct way of putting it.
    > >
    > > I can't even understand that paragraph, let alone paraphrase it.
    > My understanding of his question and my reponse to his
    > problem is that you
    > MUST not use VALID Unicode codepoints to represent INVALID
    > byte sequences
    > found in some text with alleged UTF encoding.
    OK, should the codepoints for this purpose be valid or not. If the modified
    conversion would be made standard and would replace the current UTF-16/32 to
    UTF-8 conversion, then they would have a status of close to that of
    surrogates. But not entirely. They could be considered invalid for
    applications that absolutely need the bijectivity. Which is not always the
    case. So, actually, many applications could and should consider them valid.
    And that also means that for current applications nothing changes, since
    they already consider them valid.

    What I was talking about in the paragraph in question is what happens if you
    want to take unassigned codepoints and give them a new status. And this is
    precisely what happened with surrogates. We can discuss how things should be
    called in this context, what is valid at which point and what are the
    consequences. But please note that I have abandoned this idea and am now
    pursuing a slightly different approach.


    This archive was generated by hypermail 2.1.5 : Mon Dec 13 2004 - 07:05:14 CST