From: Doug Ewell (dewell@adelphia.net)
Date: Mon Dec 13 2004 - 23:21:50 CST
Philippe VERDY wrote:
> (In fact I also think that mapping invalid sequences to U+FFFD is also
> an error, because U+FFFD is valid, and the presence of the encoding
> error in the source is lost, and will not throw exceptions in further
> processings of the remapped text, unless the application constantly
> checks for the presence of U+FFFD in the text stream, and all modules
> in the application explicitly forbids U+FFFD within its interface...)
Mapping invalid sequences to U+FFFD is explicitly permitted by
conformance clause C12a (TUS 4.0, p. 61):
"When faced with [an] ill-formed code unit sequence while transforming
or interpreting text, a conformant process must treat the first code
unit... as an illegally terminated code unit sequence -- for example, by
signaling an error, filtering the code unit out, or representing the
code unit with a marker such as U+FFFD REPLACEMENT CHARACTER."
Of course, any subsequent process that handles this text would have to
understand this convention, and not choke if handed a U+FFFD.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Mon Dec 13 2004 - 23:23:56 CST