Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

From: Mark Davis ([email protected])
Date: Mon Mar 03 2003 - 14:52:46 EST

Next message: Yung-Fong Tang: "Re: Unicode Arabic Rendering Problem"

Previous message: Asmus Freytag: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
In reply to: Asmus Freytag: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Next in thread: Asmus Freytag: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Reply: Asmus Freytag: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Perhaps I wasn't clear; I agree with you on that.

1) It is conformant to skip or substitute text, with just a code at the end
indicating that something of that sort was done.

2) Or, if someone wants more flexibility, to stop at possible errors, and
give the client of the API information so that they can do more complex
processing.

Mark
________
[email protected]
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message -----
From: "Asmus Freytag" <[email protected]>
To: "Mark Davis" <[email protected]>; "Kent Karlsson"
<[email protected]>; "'Michael (michka) Kaplan'" <[email protected]>
Cc: "'Yung-Fong Tang'" <[email protected]>; <[email protected]>
Sent: Monday, March 03, 2003 11:21
Subject: Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for
review)

> But, formally speaking, is it conformant for an API to not stop, and
merely
> raise an error flag (that the caller may or may not look at)?
>
> I argue that it is.
>
> A./
>
> At 09:09 AM 3/3/03 -0800, Mark Davis wrote:
> >Asmus has good points about the restartability, both that it gives the
API
> >user the maximal flexibility, and that many times the users don't want to
> >futz with such options, and just want the text converted.
> >
> >To provide maximal flexibility, an API will give the choice for illegal
> >squences of (1) deleting, (2) substituting (character, escape (e.g.
> >"઼", or other options), or (3) stopping with information: the
reason
> >for the error, the end position of the last successfully converted
sequence,
> >and the end position of the bad sequence. And users may want to
distinguish
> >between illegal sequences and missing characters in applying these
options;
> >that is, they may want to silently delete illegal sequences, but
substitute
> >a replacement character for missing characters.
> >
> >Mark
> >________
> >[email protected]
> >IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
> >(408) 256-3148
> >fax: (408) 256-0799
> >
> >----- Original Message -----
> >From: "Asmus Freytag" <[email protected]>
> >To: "Mark Davis" <[email protected]>; "Kent Karlsson"
> ><[email protected]>; "'Michael (michka) Kaplan'"
<[email protected]>
> >Cc: "'Yung-Fong Tang'" <[email protected]>; <[email protected]>
> >Sent: Sunday, March 02, 2003 21:10
> >Subject: Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available
for
> >review)
> >
> >
> > > At 07:21 AM 3/2/03 -0800, Mark Davis wrote:
> > > > > "C12a When a process interprets a code unit sequence which
> > > > > purports to be in a Unicode character encoding form, it
> > > > > shall treat ill-formed code unit sequences as an error
> > > > > condition, and shall not interpret such sequences as
> > > > > characters."
> > >
> > > Can we agree or disagree on whether an API that returns an error code,
but
> > > also an output buffer that contains a simplistic conversion of the
> > > erroneous sequence is or is not conformant.
> > >
> > > To me it seems that by setting an error flag in the return code, the
API
> > > has signalled that the user should not treat the output as containing
> > > correct Unicode.
> > >
> > > Such an API design (on a low enough level) might strike the right
balance
> > > between between usability in many different environments and
satisfying
> >the
> > > formal requirement.
> > >
> > > The ideal case is one where the converter stops in a restartable
> > > configuration, allowing the client to implement (or ask for) a variety
of
> > > error-recovery options. However, such an interface requires a lot of
> > > thought and may be difficult to implement for some
> > > language/platform/library environments. Further, it may be
unnecessarily
> > > difficult to use for at least some conceivable clients.
> > >
> > > A./
> > >
> > >
>
>
>

Next message: Yung-Fong Tang: "Re: Unicode Arabic Rendering Problem"
Previous message: Asmus Freytag: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
In reply to: Asmus Freytag: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Next in thread: Asmus Freytag: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Reply: Asmus Freytag: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Mar 03 2003 - 15:43:08 EST