Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

From: Mark Davis ([email protected])
Date: Mon Mar 03 2003 - 16:07:23 EST

  • Next message: Mijan: "(no subject)"

    > anything into the output buffer, even malformed Unicode, and still be

    If your converter purports to produce any one of the Unicode encoding forms,
    then it cannot conformantly produce malformed Unicode as a result.

    If, of course, it does not purport to do that, it can do anything it wants
    to.

    Mark
    ________
    [email protected]
    IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
    (408) 256-3148
    fax: (408) 256-0799

    ----- Original Message -----
    From: "Asmus Freytag" <[email protected]>
    To: "Mark Davis" <[email protected]>; "Kent Karlsson"
    <[email protected]>; "'Michael (michka) Kaplan'" <[email protected]>
    Cc: "'Yung-Fong Tang'" <[email protected]>; <[email protected]>
    Sent: Monday, March 03, 2003 12:17
    Subject: Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for
    review)

    > At 11:52 AM 3/3/03 -0800, Mark Davis wrote:
    > >Perhaps I wasn't clear; I agree with you on that.
    > >
    > >1) It is conformant to skip or substitute text, with just a code at the
    end
    > >indicating that something of that sort was done.
    >
    > It's a subtle point, but can be put into your formulation:
    >
    > What I was after is where the "substitution" itself isn't legal Unicode,
    > i.e. an unpaired surrogate in UTF-32. My take is that, formally speaking,
    > as long as there's an indication of an error condition, I'm free to put
    > anything into the output buffer, even malformed Unicode, and still be
    > conformant.
    >
    > >2) Or, if someone wants more flexibility, to stop at possible errors, and
    > >give the client of the API information so that they can do more complex
    > >processing.
    > >
    > >Mark
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Mar 03 2003 - 16:41:16 EST