RE: Corrigendum #9 from Peter Constable on 2014-06-13 (Unicode Mail List Archive)

From: Peter Constable <petercon_at_microsoft.com>
Date: Fri, 13 Jun 2014 05:14:30 +0000

From: Unicode [mailto:unicode-bounces_at_unicode.org] On Behalf Of Karl Williamson
Sent: Wednesday, June 11, 2014 9:30 PM

> I have a something like a library that was written a long time ago
> (not by me) assuming that noncharacters were illegal in open interchange.
> Programs that use the library were guaranteed that they would not receive
> noncharacters in their input.

I haven't read every post in the thread, so forgive me if I'm making incorrect inferences.

I get the impression that you think that Unicode conformance requirements have historically provided that guarantee, and that Corrigendum #9 broke that. If so, then that is a mistaken understanding of Unicode conformance.

Here is what has historically been said in the way of conformance requirements related to non-characters:

TUS 1.0: There were no conformance requirements stated. This recommendation was given:
"U+FFFF and U+FFFE are reserved and should not be transmitted or stored."

This same recommendation was repeated in later versions. However, it must be recognized that "should" statements are never absolute requirements.

Conformance requirements first appeared in TUS 2.0:

TUS 2.0, TUS 3.0:
"C5 A process shall not interpret either U+FFFE or U+FFFF as an abstract character."

TUS 4.0:
"C5 A process shall not interpret a noncharacter code point as an abstract character."

"C10 When a process purports not to modify the interpretation of a valid coded character representation, it shall make no change to that coded character representation other than the possible replacement of character sequences by their canonical-equivalent sequences or the deletion of noncharacter code points."

Btw, note that C10 makes the assumption that a valid coded character sequence can include non-character code points.

TUS 5.0 (trivially different from TUS4.0):
C2 = TUS4.0, C5

"C7 When a process purports not to modify the interpretation of a valid coded character sequence, it shall make no change to that coded character sequence other than the possible replacement of character sequences by their canonical-equivalent sequences or the deletion of noncharacter code points."

TUS 6.0:
C2 = TUS5.0, C2

"C7 When a process purports not to modify the interpretation of a valid coded character
sequence, it shall make no change to that coded character sequence other than the possible
replacement of character sequences by their canonical-equivalent sequences."

Interestingly, the change to C7 does not permit non-characters to be replaced or removed at all while claiming not to have left the interpretation intact.

So, there was a change in 6.0 that could impact conformance claims of existing implementations. But there has never been any guarantees made _by Unicode_ that non-character code points will never occur in open interchange. Interchange has always been discouraged, but never prohibited.

Peter

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Fri Jun 13 2014 - 00:15:09 CDT

This archive was generated by hypermail 2.2.0 : Fri Jun 13 2014 - 00:15:10 CDT