From: Lars Kristan (lars.kristan@hermes.si)
Date: Wed Dec 15 2004 - 08:04:02 CST
Ooooops, correction:
In response to Marcin 'Qrczak' Kowalczyk
>> Question: should a new programming language which uses Unicode for
>> string representation allow non-characters in strings? Argument for
>> allowing them: otherwise they are completely useless at all, except
>> U+FFFE for BOM detection. Argument for disallowing them: they make
>> UTF-n inappropriate for serialization of arbitrary strings, and thus
>> non-standard extensions of UTF-n must be used for serialization.
I wrote:
My opinion:
> It should allow them and process them usefully. Furthermore, this
> 'usefully' should not be up to developers to discover. It should be
> researched, described, well, in the end even standardized. IMHO, UTC
> should consider leading this process, even if it does not end with
> anything standardized in Unicode standard.
>
> Validation should be completely separated from processing. IMHO.
I wasn't paying attention to what Marcin wrote, namely the term
"non-characters".
What I wrote goes for invalid sequences and surrogates.
Lars
This archive was generated by hypermail 2.1.5 : Wed Dec 15 2004 - 08:11:33 CST