From: erra srikrishna (email@example.com)
Date: Mon Dec 17 2007 - 04:31:30 CST
i need a clarification regarding Replacement character U+FFFD.
According to Unicode Conformance C4, C5 & C6,
If any non-characters, un-assigned & low or high surrogate codepoints are existed in unicode input then they should be skipped or Replaced with U+FFFD character
According to Unicode Conformance Clause C12a,
An y Unicode (UTF8, UTF16 & UTF32) application should not accept ill-formed code unit sequences from its input. It should either signal an error or represent the code unit with a marker such as U+FFFD (REPLACEMENT CHARACTER).
I am using IBM ICU and ICU uses FFFD as default replacement character. so i want to know if input itself contains U+FFFD character then how should we treat that character.
I mean i want my application to return an error whenever above sequences are found and ICU by default replaces with FFFD. so here i am checking input for FFFD and concluding that some invalid sequence has occured that's why ICU replaced it with FFFD then generating error.
But this will not be applicable for input actually with FFFD then in this what to do. whether to generate error or anything else. i didn't see any conformance clause specifying what should be done for FFFD.
Here i am mainly convernec with UTF16 input.
Now you can chat without downloading messenger. Click here to know how.
This archive was generated by hypermail 2.1.5 : Mon Dec 17 2007 - 10:59:37 CST