Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

From: Tex Texin (
Date: Thu Feb 27 2003 - 17:19:25 EST

  • Next message: Kenneth Whistler: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"

    Hmm, is that true? Is it ok then, if I detect an unpaired surrogate, mutter
    "oops I have an error" and then drop that surrogate and continue processing
    the file, resulting in a valid utf-8 file?

    I thought for some reason this was prohibited, but if the standard does not
    prescribe error handling, than this seems legit.


    Kenneth Whistler wrote:
    > Absolutely. Error handling is a matter of software design, and not
    > something mandated in detail by the Unicode Standard.
    > If you write software which handles a GIF image, and there is
    > a corrupted byte in the middle of a 118K GIF file, you don't go
    > to the GIF specification itself, e.g.,
    > to tell your software what to do after it has processed the first
    > 59K bytes (or whatever). The GIF specification just tells you
    > what a well-formed GIF image is.
    > Likewise, the Unicode Standard tells you what a well-formed
    > UTF-8 byte sequence is. But it is the software designer who has
    > to be smart about determining what his/her software will do when
    > it encounters an error condition and finds itself dealing
    > with a sequence which is ill-formed according to the specification
    > of UTF-8 in the Unicode Standard.
    > --Ken

    Tex Texin   cell: +1 781 789 1898
    Xen Master                
    Making e-Business Work Around the World

    This archive was generated by hypermail 2.1.5 : Thu Feb 27 2003 - 18:00:29 EST