Re: RE: Roundtripping in Unicode

From: Doug Ewell (dewell@adelphia.net)
Date: Mon Dec 13 2004 - 23:21:50 CST

  • Next message: Doug Ewell: "Re: Subj: Displaying Chinese characters and Chu Nom characters"

    Philippe VERDY wrote:

    > (In fact I also think that mapping invalid sequences to U+FFFD is also
    > an error, because U+FFFD is valid, and the presence of the encoding
    > error in the source is lost, and will not throw exceptions in further
    > processings of the remapped text, unless the application constantly
    > checks for the presence of U+FFFD in the text stream, and all modules
    > in the application explicitly forbids U+FFFD within its interface...)

    Mapping invalid sequences to U+FFFD is explicitly permitted by
    conformance clause C12a (TUS 4.0, p. 61):

    "When faced with [an] ill-formed code unit sequence while transforming
    or interpreting text, a conformant process must treat the first code
    unit... as an illegally terminated code unit sequence -- for example, by
    signaling an error, filtering the code unit out, or representing the
    code unit with a marker such as U+FFFD REPLACEMENT CHARACTER."

    Of course, any subsequent process that handles this text would have to
    understand this convention, and not choke if handed a U+FFFD.

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Mon Dec 13 2004 - 23:23:56 CST