Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

From: Yung-Fong Tang (ftang@netscape.com)
Date: Fri Feb 28 2003 - 13:21:01 EST

Next message: Mete Kural: "Unicode Arabic Rendering Problem"

Previous message: Yung-Fong Tang: "Re: Unicode 4.0 BETA available for review"
In reply to: Kenneth Whistler: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Next in thread: Michael \(michka\) Kaplan: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Reply: Michael \(michka\) Kaplan: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Reply: Markus Scherer: "Re: UTF-8 Error Handling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Kenneth Whistler wrote:

>Think of it this way. Does anyone expect the ASCII standard to tell,
>in detail, what a process should or should not do if it receives
>data which purports to be ASCII, but which contains an 0x80 byte
>in it? All the ASCII standard can really do is tell you that
>0x80 is not defined in ASCII, and a conformant process shall not
>interpret 0x80 as an ASCII character. Beyond that, it is up to
>the software engineers to figure out who goofed up in mislabelling
>or corrupting the data, and what the process receiving the bad data
>should do about it.
>
>
>
>
That is not a good comparision. ASCII is a single byte character code
standard. And when I got a 0x80 in ASCII string, I know where is the
boundary- the boundary is the whole 8-bits of that 0x80 is bad. The
scope is not the first 3 bits nor 9 bits- but the 8 bits data. I cannot
tell the rest of the data is good or bad, but I know ASCII is only
8-bits and 8 bits only.

Same thing for JIS x0208 (a TWO and only TWO bytes character set, not a
variable length character set). If I am processing a ISO-2022-JP message
and in the JIS x0208 mode and I got a 0x24 0xa8 I know the boundary of
that problem is 16 bits, not 8 -bits nor 32 bits.

When you deal with encoding which need states (ISO-2022, ISO-2022-JP,
etc) or variable length encoding (Shift_JIS, Big5, UTF-8), then the
situration is different.

Next message: Mete Kural: "Unicode Arabic Rendering Problem"
Previous message: Yung-Fong Tang: "Re: Unicode 4.0 BETA available for review"
In reply to: Kenneth Whistler: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Next in thread: Michael \(michka\) Kaplan: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Reply: Michael \(michka\) Kaplan: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Reply: Markus Scherer: "Re: UTF-8 Error Handling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Feb 28 2003 - 14:01:24 EST