Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

From: Asmus Freytag (
Date: Mon Mar 03 2003 - 00:10:39 EST

  • Next message: Doug Ewell: "[OT] Re: Impossible combinations?"

    At 07:21 AM 3/2/03 -0800, Mark Davis wrote:
    > > "C12a When a process interprets a code unit sequence which
    > > purports to be in a Unicode character encoding form, it
    > > shall treat ill-formed code unit sequences as an error
    > > condition, and shall not interpret such sequences as
    > > characters."

    Can we agree or disagree on whether an API that returns an error code, but
    also an output buffer that contains a simplistic conversion of the
    erroneous sequence is or is not conformant.

    To me it seems that by setting an error flag in the return code, the API
    has signalled that the user should not treat the output as containing
    correct Unicode.

    Such an API design (on a low enough level) might strike the right balance
    between between usability in many different environments and satisfying the
    formal requirement.

    The ideal case is one where the converter stops in a restartable
    configuration, allowing the client to implement (or ask for) a variety of
    error-recovery options. However, such an interface requires a lot of
    thought and may be difficult to implement for some
    language/platform/library environments. Further, it may be unnecessarily
    difficult to use for at least some conceivable clients.


    This archive was generated by hypermail 2.1.5 : Mon Mar 03 2003 - 00:36:02 EST