RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)

From: Lars Kristan (lars.kristan@hermes.si)
Date: Sat Dec 11 2004 - 05:47:50 CST

Next message: Johannes Bergerhausen: "Re: US-ASCII (was: Re: Invalid UTF-8 sequences)"

Previous message: Michael Everson: "Re: US-ASCII (was: Re: Invalid UTF-8 sequences)"
Maybe in reply to: Doug Ewell: "Invalid UTF-8 sequences (was: Re: Nicest UTF)"
Next in thread: Lars Kristan: "RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

John Cowan wrote:
> However, although they are *technically* octet sequences, they
> are *functionally* character strings. That's the issue.
Nicely put! But UTC does not seem to care.

>
> > The point I'm making is that *whatever* you do, you are still
> > asking for implementers to obey some convention on conversion
> > failures for corrupt, uninterpretable character data.
> > My assessment is that you'd have no better success at making
> > this work universally well with some set of 128 magic bullet
> > corruption pills on Plane 14 than you have with the
> > existing Quoted-Unprintable as a convention.
>
> It doesn't have to work universally; indeed, it becomes a QOI issue.
> Allocating representations of bytes with "bits that are high" makes
> it possible to do something recoverable, at very little expense to the
> Unicode Consortium.
Except that the expense should be slightly higher. The importance of these
replacement codepoints is still underestimated. They belong in the BMP. And
at least there is no way anyone can blame UTC for a cultural bias in this
case, these codepoints are universal.

>
> > Further, as it turns out that Lars is actually asking for
> > "standardizing" corrupt UTF-8, a notion that isn't going to
> > fly even two feet, I think the whole idea is going to be
> > a complete non-starter.
>
> I agree that that part won't fly, absolutely.
Then I'll have to restructure it.

Lars

Next message: Johannes Bergerhausen: "Re: US-ASCII (was: Re: Invalid UTF-8 sequences)"
Previous message: Michael Everson: "Re: US-ASCII (was: Re: Invalid UTF-8 sequences)"
Maybe in reply to: Doug Ewell: "Invalid UTF-8 sequences (was: Re: Nicest UTF)"
Next in thread: Lars Kristan: "RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Dec 11 2004 - 05:52:50 CST