Re: Encoding of non-characters

From: Doug Ewell (dewell@compuserve.com)
Date: Sat Jul 29 2000 - 13:40:30 EDT

Next message: Mark Davis: "Re: Encoding of non-characters"
Previous message: Mark Davis: "Re: Encoding of non-characters"
Maybe in reply to: Doug Ewell: "Encoding of non-characters"
Next in thread: Mark Davis: "Re: Encoding of non-characters"
Reply: Mark Davis: "Re: Encoding of non-characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Mark Davis <markdavis@ispchannel.com> wrote:

> Here is the issue. Because of the prevalence of UTF-16, and to
> preserve the round-tripping of UTFs to and from UTF-16 (even UTF-16
> containing mal-formed text containing non-characters and/or unpaired
> surrogates), a UTF must always roundtrip all codepoints between 0 and
> 10FFFF, inclusive.

Wait, now I'm lost. It was precisely *because* of UTF-16 that I thought
it was OK not to round-trip U+D800 through U+DFFF. After all, this is
a characteristic of UTF-16 itself. For example, it cannot round-trip
the following illegal sequence of four UCS-2 (pre-UTF-16) code points:

U+DC00 U+D800 U+DC00 U+D800

UTF-16 would regard this as the unpaired low surrogate U+DC00, followed
by the perfectly legal U+10000, followed by the unpaired high surrogate
U+D800. If I really intended to have four unpaired surrogates, I can't
use UTF-16 to represent them.

> It is of course permissible for a UTF converter to offer an option to
> detect and throw an error on any mal-formed text.

Then is it a conformance requirement to round-trip malformed text
(including illegal Unicode code points), or isn't it?

-Doug Ewell
Fullerton, California

Next message: Mark Davis: "Re: Encoding of non-characters"
Previous message: Mark Davis: "Re: Encoding of non-characters"
Maybe in reply to: Doug Ewell: "Encoding of non-characters"
Next in thread: Mark Davis: "Re: Encoding of non-characters"
Reply: Mark Davis: "Re: Encoding of non-characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT