UTF-16 inside UTF-8

From: Jill Ramonsky (Jill.Ramonsky@Aculab.com)
Date: Tue Nov 04 2003 - 09:37:00 EST

Next message: Doug Ewell: "Re: UTF-16 inside UTF-8"

Previous message: Jill Ramonsky: "RE: UTF-16 inside UTF-8"
Next in thread: David E. Hollingsworth: "Re: UTF-16 inside UTF-8"
Reply: David E. Hollingsworth: "Re: UTF-16 inside UTF-8"
Maybe reply: Jill Ramonsky: "RE: UTF-16 inside UTF-8"
Reply: Doug Ewell: "Re: UTF-16 inside UTF-8"
Reply: Peter Kirk: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Reply: Doug Ewell: "Ill-formed sequences (was: Re: UTF-16 inside UTF-8)"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi,

What is a conforming application supposed to do if, when decoding a
UTF-8 stream (or indeed a UTF-32 stream, etc.), it encounters a sequence
of bytes which decodes to U+D800, U+DF00 ?

Of course, if such a sequence were encountered during UTF-16 processing
it would be pretty obvious, but I'm not talking UTF-16 any more. At
least, not directly. Nonetheless, such a sequence could arise if
Application A encodes text to a file using UTF-16, which is then read by
Application B (a very old, legacy application, unaware of the existence
of codepoints above U+FFFF) and re-saved in UTF-8.

This question generalises to ... should /all/ encoding schemes treat
surrogate pairs as surrogate pairs, or just UTF-16 ?

This question generalises further still, to ... do the phrases
"surrogate character" and "surrogate pair" have any meaning whatsoever
outside UTF-16?

Jill

Next message: Doug Ewell: "Re: UTF-16 inside UTF-8"
Previous message: Jill Ramonsky: "RE: UTF-16 inside UTF-8"
Next in thread: David E. Hollingsworth: "Re: UTF-16 inside UTF-8"
Reply: David E. Hollingsworth: "Re: UTF-16 inside UTF-8"
Maybe reply: Jill Ramonsky: "RE: UTF-16 inside UTF-8"
Reply: Doug Ewell: "Re: UTF-16 inside UTF-8"
Reply: Peter Kirk: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Reply: Doug Ewell: "Ill-formed sequences (was: Re: UTF-16 inside UTF-8)"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Maybe reply: YTang0648@aol.com: "Re: UTF-16 inside UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Nov 04 2003 - 11:10:48 EST