Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

From: Markus Scherer (
Date: Mon Mar 03 2003 - 12:00:47 EST

  • Next message: Mark Davis: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"

    I am not sure yet how far I want to get into this discussion... but this seems worth mentioning:

    Asmus Freytag wrote:
    > The ideal case is one where the converter stops in a restartable
    > configuration, allowing the client to implement (or ask for) a variety
    > of error-recovery options.

    A nice description of ICU's conversion API, where a user-settable callback function determines what
    to do with ill-formed sequences and unmappable but legal sequences.

    The catch is "stops in a restartable configuration". One could discuss endlessly what the best such
    configuration is depending on the particular input sequence and expected error type probabilities.
    This is where it becomes much more muddy and where ICU cannot currently be tailored. If you don't
    like what it identified as "the sequence" that's in error or unmappable, then you are on your own.

    > However, such an interface requires a lot of
    > thought and may be difficult to implement for some
    > language/platform/library environments.

    This I take as a compliment...

    > Further, it may be unnecessarily
    > difficult to use for at least some conceivable clients.

    It is somewhat difficult to use when you do want to write your own error callback, but we provide
    canned ones for most of what has been discussed here - skip (omit) the sequence, replace with SUB
    (default), stop with an error code, replace with an escape sequence. Plus options.

    Ok, one statement beyond this. Personal opinion: I think it is not necessary to _require_ any
    behavior for ill-formed sequences other than the ability to give an error code.


    Opinions expressed here may not reflect my company's positions unless otherwise noted.

    This archive was generated by hypermail 2.1.5 : Mon Mar 03 2003 - 12:45:51 EST