Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

From: Mark Davis (mark.davis@jtcsv.com)
Date: Mon Mar 03 2003 - 16:07:23 EST

  • Next message: Mijan: "(no subject)"

    > anything into the output buffer, even malformed Unicode, and still be

    If your converter purports to produce any one of the Unicode encoding forms,
    then it cannot conformantly produce malformed Unicode as a result.

    If, of course, it does not purport to do that, it can do anything it wants
    to.

    Mark
    ________
    mark.davis@jtcsv.com
    IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
    (408) 256-3148
    fax: (408) 256-0799

    ----- Original Message -----
    From: "Asmus Freytag" <asmusf@ix.netcom.com>
    To: "Mark Davis" <mark.davis@jtcsv.com>; "Kent Karlsson"
    <kentk@md.chalmers.se>; "'Michael (michka) Kaplan'" <michka@trigeminal.com>
    Cc: "'Yung-Fong Tang'" <ftang@netscape.com>; <unicode@unicode.org>
    Sent: Monday, March 03, 2003 12:17
    Subject: Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for
    review)

    > At 11:52 AM 3/3/03 -0800, Mark Davis wrote:
    > >Perhaps I wasn't clear; I agree with you on that.
    > >
    > >1) It is conformant to skip or substitute text, with just a code at the
    end
    > >indicating that something of that sort was done.
    >
    > It's a subtle point, but can be put into your formulation:
    >
    > What I was after is where the "substitution" itself isn't legal Unicode,
    > i.e. an unpaired surrogate in UTF-32. My take is that, formally speaking,
    > as long as there's an indication of an error condition, I'm free to put
    > anything into the output buffer, even malformed Unicode, and still be
    > conformant.
    >
    > >2) Or, if someone wants more flexibility, to stop at possible errors, and
    > >give the client of the API information so that they can do more complex
    > >processing.
    > >
    > >Mark
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Mar 03 2003 - 16:41:16 EST