Re: UTF-8 Error Handling

From: Markus Scherer ([email protected])
Date: Fri Feb 28 2003 - 16:02:48 EST

Previous message: [email protected]: "Re: Unicode Arabic Rendering Problem"
In reply to: Yung-Fong Tang: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Yung-Fong Tang wrote:
> Same thing for JIS x0208 (a TWO and only TWO bytes character set, not a
> variable length character set). If I am processing a ISO-2022-JP message
> and in the JIS x0208 mode and I got a 0x24 0xa8 I know the boundary of
> that problem is 16 bits, not 8 -bits nor 32 bits.

Not true. You don't know if
- a byte was dropped before or after 0x24
-> the first sequence is only 1 byte
- a byte was corrupted to become 0xa8
-> the sequence consists of two bytes
- a wild combination of multiple errors

With a single-unit encoding, you can always assume that an illegal unit is a one-unit error. With
any multi-unit encoding, you can only guess.

markus

-- 
Opinions expressed here may not reflect my company's positions unless otherwise noted.

Previous message: [email protected]: "Re: Unicode Arabic Rendering Problem"
In reply to: Yung-Fong Tang: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Feb 28 2003 - 16:43:54 EST