UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Feb 27 2003 - 15:53:59 EST

  • Next message: Yung-Fong Tang: "Re: Unicode 4.0 BETA available for review"

    Frank Tang responded to Kent Karlsson's response:

    > The problem I need to deal with is not GENERATE those UTF-8, but how to
    > handle these DATA when my code receive it. For example, when I receive a
    > 10K UTF-8 file which have 1000 lines of text, if there are one UTF-8
    > sequence in the line 990 are ill-formed, should I fire the "error" for
    > 1. the whole file (10K, 1000 lines),
    > 2. all the line after line 899,
    > 3. the line 990 itslef,

    etc. etc.

    > I there are others way you can scope the ERROR, I probably can continue
    > it on and on and tell you 10-20 other way to scope it if I spend 20 more
    > minutes.
    > I do believe the error handling should be application specific.

    Absolutely. Error handling is a matter of software design, and not
    something mandated in detail by the Unicode Standard.

    If you write software which handles a GIF image, and there is
    a corrupted byte in the middle of a 118K GIF file, you don't go
    to the GIF specification itself, e.g.,
    to tell your software what to do after it has processed the first
    59K bytes (or whatever). The GIF specification just tells you
    what a well-formed GIF image is.

    Likewise, the Unicode Standard tells you what a well-formed
    UTF-8 byte sequence is. But it is the software designer who has
    to be smart about determining what his/her software will do when
    it encounters an error condition and finds itself dealing
    with a sequence which is ill-formed according to the specification
    of UTF-8 in the Unicode Standard.


    This archive was generated by hypermail 2.1.5 : Thu Feb 27 2003 - 16:40:03 EST