UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Feb 27 2003 - 15:53:59 EST

Next message: Yung-Fong Tang: "Re: Unicode 4.0 BETA available for review"

Previous message: Yung-Fong Tang: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Next in thread: Yung-Fong Tang: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Reply: Yung-Fong Tang: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Reply: Tex Texin: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Frank Tang responded to Kent Karlsson's response:

> The problem I need to deal with is not GENERATE those UTF-8, but how to
> handle these DATA when my code receive it. For example, when I receive a
> 10K UTF-8 file which have 1000 lines of text, if there are one UTF-8
> sequence in the line 990 are ill-formed, should I fire the "error" for
> 1. the whole file (10K, 1000 lines),
> 2. all the line after line 899,
> 3. the line 990 itslef,

etc. etc.

>
> I there are others way you can scope the ERROR, I probably can continue
> it on and on and tell you 10-20 other way to scope it if I spend 20 more
> minutes.
>
> I do believe the error handling should be application specific.

Absolutely. Error handling is a matter of software design, and not
something mandated in detail by the Unicode Standard.

If you write software which handles a GIF image, and there is
a corrupted byte in the middle of a 118K GIF file, you don't go
to the GIF specification itself, e.g.,
http://www.w3.org/Graphics/GIF/spec-gif87.txt
to tell your software what to do after it has processed the first
59K bytes (or whatever). The GIF specification just tells you
what a well-formed GIF image is.

Likewise, the Unicode Standard tells you what a well-formed
UTF-8 byte sequence is. But it is the software designer who has
to be smart about determining what his/her software will do when
it encounters an error condition and finds itself dealing
with a sequence which is ill-formed according to the specification
of UTF-8 in the Unicode Standard.

--Ken

Next message: Yung-Fong Tang: "Re: Unicode 4.0 BETA available for review"
Previous message: Yung-Fong Tang: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Next in thread: Yung-Fong Tang: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Reply: Yung-Fong Tang: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Reply: Tex Texin: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Feb 27 2003 - 16:40:03 EST